A few months ago, as part of looking through the changes in Windows 10 Anniversary Update for the Windows Internals 7th Edition book, I noticed that the kernel began enforcing usage of the CR4[FSGSBASE] feature (introduced in Intel Ivy Bridge processors, see Section 4.5.3 in the AMD Manuals) in order to allow usage of User Mode Scheduling (UMS).
This led me to further analyze how UMS worked before this processor feature was added – something which I knew a little bit about, but not enough to write on.
Teb field set to indicate it is a special allocation. Well, it turns out that the AMD64 manuals are pretty clear about the fact that the mov gs, XXX and pop gs instructions: Therefore, x86-style segmentation is still fully supported when it comes to FS and GS, even when operating in long mode, and overrides the 64-bit base address stored in MSR_GS_BASE.
This prevents any changes to be made to this address through calls such as Virtual Protect. However, because there is no 64-bit data segment descriptor table entry, only a 32-bit base address can be used, requiring this complex remapping done by the kernel.
Note that if the new KPROCESS does not have an LDT, the LDT entry in the GDT is not deleted – therefore the GDT will always have an LDT entry now that at least one UMS thread in a process has been created, as can be seen in this debugger output: dx ((nt!
_KGDTENTRY64 *)0xffffe0002143e2f0) [ 0x000] Limit Low : 0xffff [Type: unsigned short] [ 0x002] Base Low : 0xd000 [Type: unsigned short] [ 0x004] Bytes [Type: ] [ 0x004] Bits [Type: ] [ 0x008] Base Upper : 0xffffe000 [Type: unsigned long] [ 0x00c] Must Be Zero : 0x0 [Type: unsigned long] Call gates are a mechanism which allows 16-bit and 32-bit legacy applications to go from a lower privilege level to a higher privilege level.
But they still are, and so adding in the TABLE_INDICATOR (TI) bit (0x4) in a segment will result in the processor reading the LDTR to recover the LDT base address and dereference the segment indicated by the other bits.
This was my second surprise, as I had no idea LDTs were still something supported when executing native 64-bit code (i.e.: ‘long mode’).That being said, by calling the user-mode API Enter Ums Scheduling Mode, which basically calls Nt Set Information Thread with the Thread Ums Information class, the kernel will go through the creation of an LDT (Ke Initialize Process Ldt). If the TEB happens to fall in the 32-bit portion of the address space (i.e.: than 0x FFFFFF000), it is set as the base address of a new segment in the LDT (using Ldt Free Selector Hint to choose which selector – in this case, 0x00), and the Teb Mapped Low Va field in KTHREAD replicates the real TEB address.This, in turn, will populate the following fields in KPROCESS: Ldt System Descriptor and writing into the GDT at offset 0x60 on Windows 10, or offset 0x70 on Windows 8.1 (bonus round: we’ll see why there’s a difference a bit later). On the other hand, if the TEB address is above 4GB, Windows 8.1 and earlier will transform the private allocation holding the TEB into a shared mapping (using a prototype PTE) and re-allocate a second copy at the first available top-down address available (which would usually be 0x FFFFE000).This literally brought back memories of Unreal Mode.Clearly, though, Microsoft was paying attention (did they request this? As you can probably now guess, UMS leverages this particular feature (which is why it is only available on x64 versions of Windows).