Archive for the ‘NT Internals’ Category

Handy debugger tricks: Setting osloader options on a per-boot basis

Friday, July 30th, 2010

Sometimes, you’ll find yourself wishing that you could edit the boot options for a particular Windows instance just for a single boot (perhaps to enable debugging with non-default parameters, above and beyond what the F8 menus allows).

While the F8 boot menu has a lot of options, it’s sometimes the case that you need greater flexibility than what it provides (say, to try and enable USB2 debugging on the fly for a single boot — perhaps you need to use the debugger to rescue a system that won’t boot after a change you’ve made, for instance).

It turns out that there’s actually a (perhaps little-known) way to do this starting with Windows Vista and later; at boot time (whenever you could enter the F8 menu), you can strike F10 and find yourself at a prompt that allows you to directly edit the osloader options for the current boot. (Remember that the settings you enter here aren’t persisted across reboots.)

From here, can enable debugging or change any other osloader setting which you could permanently configure through bcdedit (but only for this boot). This capability is especially helpful if you need to debug setup with non-standard debugger connection setting, which otherwise presents a painful problem.

Remember that osloader options are the old-style options that you used to set in boot.ini (the debugger documentation and MSDN outline these). Don’t use the new-style bcdedit names, as those are only recognized by bcdedit; internally, the options passed on the osloader command line continue to be their old forms (i.e. /DEBUG /DEBUGPORT=1394 /CHANNEL=0).

Examining kernel stacks on Vista/Srv08 using kdbgctrl -td, even when you haven’t booted /DEBUG

Friday, March 13th, 2009

Starting with Vista/Srv08, local kernel debugging support has been locked down to require that the system was booted /DEBUG before-hand.

For me, this has been the source of great annoyance, as if something strange happens to a Vista/Srv08 box that requires peering into kernel mode (usually to get a stack trace), then one tends to hit a brick wall if the box wasn’t booted with /DEBUG. (The only supported option usually available would be manually bugcheck the box and hope that the crash dump, if the system was configured to write one, has useful data. This is, of course, not a particularly great option; especially as in the default configuration for dumps on Vista/Srv08, it’s likely that user mode memory won’t be captured.)

However, it turns out that there’s limited support in the operating system to capture some kernel mode data for debugging purposes, even if you haven’t booted /DEBUG. Specifically, kdbgctrl.exe (a tool shipping with the WinDbg distribution) supports the notion of capturing a “kernel triage dump”, which basically takes a miniature snapshot of a given process and its active user mode threads. To use this feature, you need to have SeDebugPrivilege (i.e. you need to be an administrative-equivalently-privileged user).

Unlike conventional local KD support, triage dump writing doesn’t give unrestricted access to kernel mode memory on a live system. Instead, it instructs the kernel to create a small dump file that contains a limited, pre-set amount of information (mostly, just the process and associated threads, and if possible, the stacks of said threads). As a result, you can’t use it for general kernel memory examination. However, it’s sometimes enough to do the trick if you just need to capture what the kernel mode side stack trace of a particular thread is.

To use this feature, you can invoke “kdbgctrl.exe -td pid dump-filename“, which takes a snapshot of the process identified by a pid, and writes it out to a dump file named by dump-filename. This support is very tersely documented if you invoke kdbgctrl.exe on the command line with no arguments:

kdbgctrl
Usage: kdbgctrl <options>
Options:
[...]
-td <pid> <file> - Get a kernel triage dump

Now, the kernel’s support for writing out a triage dump isn’t by any means directly comparable to the power afforded by kd.exe -kl. As previously mentioned, the dump is extremely minimalistic, only containing information about a process object and its associated threads and the bare minimums allowing the debugger to function enough to pull this information from the dump. Just about all that you can reasonably expect to do with it is examine the stacks of the process that was captured, via the “!process -1 1f” command. (In addition, some extra information, such as the set of pending APCs attached to each thread, is also saved – enough to allow “!apc thread” to function.) Sometimes, however, it’s just enough to be able to figure out what something is blocking on kernel mode side (especially when troubleshooting deadlocks), and the triage dump functionality can aid with that.

However, there are some limitations with triage dumps, even above the relatively terse amount of data they convey. The triage dump support needs to be careful not to crash the system when writing out the dump, so all of this information is captured within the normal confines of safe kernel mode operation. In particular, this means that the triage dump writing logic doesn’t just blithely walk the thread or process lists like kd -kl would, or perform other potentially unsafe operations.

Instead, the triage dump creation process operates with the cooperation of all threads involved. This is immediately apparent when taking a look at the thread stacks captured in a triage dump:

BugCheck 69696969, {0, 0, 0, 0}

Probably caused by : ntkrnlmp.exe ( nt!IopKernSnapAPCMiniDump+55 )

Followup: MachineOwner
---------

2: kd> k
Call Site
nt!IopKernSnapAPCMiniDump+0x55
nt!IopKernSnapSpecialApc+0x50
nt!KiDeliverApc+0x1e2
nt!KiSwapThread+0x491
nt!KeWaitForSingleObject+0x2da
nt!KiSuspendThread+0x29
nt!KiDeliverApc+0x420
nt!KiSwapThread+0x491
nt!KeWaitForMultipleObjects+0x2d6
nt!ObpWaitForMultipleObjects+0x26e
nt!NtWaitForMultipleObjects+0xe2
nt!KiSystemServiceCopyEnd+0x13
0x7741602a

(Note that the system doesn’t actually bugcheck as a result of creating a triage dump. Kernel dumps implicitly need a bug check parameters record, however, so one is synthesized for the dump file.)

As is immediately obvious from the call stack, thread stack collection in triage dumps operates (in Vista/Srv08) by tripping a kernel APC on the target thread, which then (as it’s in-thread-context) safely collects data about that thread’s stack.

This, of course, presents a limitation: If a thread is sufficiently wedged such that it can’t run kernel APCs, then a triage dump won’t be able to capture full information about that thread. However, for some classes of scenarios where simply capturing a kernel mode stack is sufficient, triage dumps can sometimes fill in the gap in conjuction with user mode debugging, even if the system wasn’t booted with /DEBUG.

Understanding the kernel address space on 32-bit Windows Vista

Monday, February 23rd, 2009

[Warning: The dynamic kernel address space is subject to future changes in future releases. In fact, the set of possible address space regions types on public 32-bit Win7 drops is different from the original release of the dynamic kernel address space logic in Windows Vista. You should not make hardcoded assumptions about this logic in production code, as the basic premises outlined in this post are subject to change without warning on future operating system revisions.]

(If you like, you may skip the history lesson and background information and skip to the gory details if you’re already familiar with the subject.)

With 32-bit Windows Vista, significant changes were made to the way the kernel mode address space was laid out. In previous operating system releases, the kernel address space was divvied up into large, relatively fixed-size regions ahead of time. For example, the paged- and non-paged pools resided within fixed virtual address ranges that were, for the most part, calculated at boot time. This meant that the memory manager needed to make a decision up front about how much address space to dedicate to the paged pool versus the non-paged pool, or how much address space to devote to the system cache, and soforth. (The address space is not strictly completely fixed on a 32-bit legacy system. There are several registry options that can be used to manually trade-off address space from one resource to another. However, these settings are only taken into account at memory manager initialization time during system boot.)

As a consequence of these mostly static (after boot time) calculations, the memory manager had limited flexibility to respond to differing memory usage workloads at runtime. In particular, because the memory manager needed to dedicate large address space regions up front to one of several different resource sets (i.e. paged pool, non-paged pool, system cache), situations may arise wherein one of these resources may be heavily utilized to the point of exhaustion of the address range carved out for it (such as the paged pool), but another “peer” resource (such as the non-paged pool) may be experiencing only comparatively light utilization. Because the memory manager has no way to “take back” address space from the non-paged pool, for example, and hand it off to the paged pool if it so turns out allocations from the paged pool (for example) may fail while there’s plenty of comparatively unused address space that has all been blocked off for the exclusive usage of the non-paged pool.

On 64-bit platforms, this is usually not a significant issue, as the amount of kernel address space available is orders of magnitude beyond the total amount of physical memory available in even the largest systems available (for today, anyway). However, consider that on 32-bit systems, the kernel only has 2GB (and sometimes only 1GB, if the system is booted with /3GB) of address space available to play around with. In this case, the fact that the memory manager needs to carve off hundreds of megabytes of address space (out of only 2 or even 1GB total) up-front for pools and the system cache becomes a much more eye-catching issue. In fact, the scenario I described previously is not even that difficult to achieve in practice, as the size of the non-paged or paged pools are typically quite small compared to the amount of physical memory available to even baseline consumer desktop systems nowadays. Throw in a driver or workload that heavily utilizes paged- or non-paged pool for some reason, and this sort of “premature” pool exhaustion may occur much sooner than one might naively expect.

Now, while it’s possible to manually address this issue to a degree on 32-bit system using the aforementioned registry knobs, determining the optimum values for these knobs on a particular workload is not necessarily an easy task. (Most sysadmins I know of tend to have their eyes glaze over when step one includes “Attach a kernel debugger to the computer.”)

To address this growing problem, the memory manager was restructured by the Windows Vista timeframe to no longer treat the kernel address space as a large (or, not so large, depending upon how one wishes to view it) slab to carve up into very large, fixed-size regions at boot time. Hence, the concept of the dynamic kernel address space was born. The basic idea is that instead of carving the relatively meagre kernel address space available on 32-bit systems up into large chunks at boot time, the memory manager “reserves its judgement” until there is need for additional address space for a partial resource (such as the paged- or non-paged pool), and only then hands out another chunk of address space to the component in question (such as the kernel pool allocator).

While this has been mentioned publicly for some time, to the best of my knowledge, nobody had really sat down and talked about how this feature really worked under the hood. This is relevant for a couple of reasons:

  1. The !address debugger extension doesn’t understand the dynamic kernel address space right now. This means that you can’t easily figure out where an address came from in the debugger on 32-bit Windows Vista systems (or later systems that might use a similar form of dynamic kernel address space).
  2. Understanding the basic concepts of how the feature works provides some insight into what it can (and cannot do) to help alleviate the kernel address space crunch.
  3. While the kernel address space layout has more or less been a relatively well understood concept for 32-bit systems to date, much of that knowledge doesn’t really directly translate to Windows Vista (and potentially later) systems.


Please note that future implementations may not necessarily function the same under the hood with respect to how the kernel address space operates.

Internally, the way the memory manager provides address space to resources such as the non-paged or paged- pools has been restructured such that each of the large address space resources act as clients to a new kernel virtual address space allocator. Wherein these address space resources previously received their address ranges in the form of large, pre-calculated regions at boot time, instead, each now calls upon the memory manager’s internal kernel address space allocator (MiObtainSystemVa) to request additional address space.

The kernel address space allocator can be thought of as maintaining a logical pool of unused virtual address space regions. It’s important to note that the address space allocator doesn’t actually allocate any backing store for the returned address space, nor any other management structures (such as PTEs); it simply reserves a chunk of address space exclusively for the use of the caller. This architecture is required due to the fact that everything from driver image mapping to the paged- and non-paged pool backends have been converted to use the kernel address space allocator routines. Each of these components has very different views of what they’ll actually want to use the address space for, but they all commonly need an address space to work in.

Similarly, if a component has obtained an address space region that it no longer requires, then it may return it to the memory manager with a call to the internal kernel address space allocator routine MiReturnSystemVa.

To place things into perspective, this system is conceptually analogous to reserving a large address space region using VirtualAlloc with MEM_RESERVE in user mode. A MEM_RESERVE reservation doesn’t commit any backing store that allows data to reside at a particular place in the address space, but it does grant exclusive usage of an address range of a particular size, which can then be used in conjunction with whatever backing store the caller requires. Likewise, it is similarly up to the caller to decide how they wish to back the address space returned by MiObtainSystemVa.

The address space chunks dealt with by the kernel address space allocator, similarly to the user mode address space reservation system, don’t necessarily need to be highly granular. Instead, a granularity that is convenient for the memory management is chosen. This is because the clients of the kernel address space allocator will then subdivide the address ranges they receive for their own usage. (The exact granularity of a kernel address space allocation is, of course, subject to change in future releases based upon the whims of what is preferable for the memory manager)

For example, if a driver requests an allocation from the non-paged pool, and there isn’t enough address space assigned to the non-paged pool to handle the request, then the non-paged pool allocator backend will call MiObtainSystemVa to retrieve additional address space. If successful, it will conceptually add this address space to its free address space list, and then return a subdivided chunk of this address space (mated with physical memory backing store, as in this example, we are speaking of the non-paged pool) to the driver. The next request for memory from the non-paged pool might then come from the same address range previously obtained by the non-paged pool allocator backend. This behavior is, again, conceptually similar to how the user mode heap obtains large amounts of address space from the memory manager and then subdivides these address regions into smaller allocations for a caller of, say, operator new.

All of this happens transparently to the existing public clients of the memory manager. For example, drivers don’t observe any particularly different behavior from ExAllocatePoolWithTag. However, because the address space isn’t pre-carved into large regions ahead of time, the memory manager no longer has its hands proverbially tied with respect to allowing one particular consumer of address space to obtain comparatively much more address space than usual (of course, at a cost to the available address space to other components). In effect, the memory manager is now much more able to self-tune for a wide variety of workloads without the user of the system needing to manually discover how much address space would be better allocated to the paged pool versus the system cache, for example.

In addition, the dynamic kernel address space infrastructure has had other benefits as well. As the address spans for the various major consumers of kernel address space are now demand-allocated, so too are PTE and other paging-related structures related to those address spans. This translates to reduced boot-time memory usage. Prior to the dynamic kernel address space’s introduction, the kernel would reduce the size of the various mostly-static address regions based on the amount of physical memory on the system for small systems. However, for large systems, and especially large 64-bit systems, paging-related structures potentially describing large address space regions were pre-allocated at boot time.

On small 64-bit systems, the demand-allocation of paging related structures also removes address space limits on the sizes of individual large address space regions that were previously present to avoid having to pre-allocate vast ranges of paging-related structures.

Presently, on 64-bit systems, many, though not all of the major kernel address space consumers still have their own dedicated address regions assigned internally by the kernel address space allocator, although this could easily change as need be in a future release. However, demand-creation of paging-describing structures is still realized (with the reduction in boot-time memory requirements as described above,

I mentioned earlier that the !address extension doesn’t understand the dynamic kernel address space as of the time of this writing. If you need to determine where a particular address came from while debugging on a 32-bit system that features a dynamic kernel address space, however, you can manually do this by looking into the memory manager’s internal tracking structures to discern for which reason a particular chunk of address space was checked out. (This isn’t quite as granular as !address as, for example, kernel stacks don’t have their own dedicated address range (in Windows Vista). However, it will at least allow you to quickly tell if a particular piece of address space is part of the paged- or non-paged pool, whether it’s part of a driver image section, and soforth.)

In the debugger, you can issue the following (admittedly long and ugly) command to ask the memory manager what type of address a particular virtual address region is. Replace <<ADDRESS>> with the kernel address that you wish to inquire about:

?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))[ @@masm( ( <<ADDRESS>> - poi(nt!MmSystemRangeStart)) / (@$pagesize *

@$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )

Here’s a couple of examples:


kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( 89445008 - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaNonPagedPool (5)

kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( ndis - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaDriverImages (12)

kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( win32k - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaSessionGlobalSpace (11)

The above technique will not work on 64-bit systems that utilize a dynamic (demand-allocated) kernel address space, as the address space tracking is performed differently internally.

(Many thanks to Andrew Rogers and Landy Wang, who were gracious enough to spend some time divulging insights on the subject.)

Why hooking system services is more difficult (and dangerous) than it looks

Saturday, July 26th, 2008

System service hooking (and kernel mode hooking in general) is one of those near-and-dear things to me which I’ve got mixed feelings about. (System service hooking refers to intercepting system calls, like NtCreateFile, in kernel mode using a custom driver.)

On one hand, hooking things like system services can be extremely tricky at best, and outright dangerous at worst. There are a number of ways to write code that is almost right, but will fail in rare edge cases. Furthermore, there are even more ways to write code that will behave correctly as long as system service callers “play nicely” and do the right thing, but sneaks in security holes that only become visible once somebody in user mode tries to “bend the rules” a bit.

On the other hand, though, there are certainly some interesting things that one can do in kernel mode, for which there really aren’t any supported or well-defined extensibility interfaces without “getting one’s hands dirty” with a little hooking, here or there.

Microsoft’s policy on hooking in kernel mode (whether patching system services or otherwise) is pretty clear: They would very much rather that nobody even thought about doing anything remotely like code patching or hooking. Having seen virtually every anti-virus product under the sun employ hooking in one way or another at some point during their lifetime (and subsequently fail to do it correctly, usually with catastrophic consequences), I can certainly understand where they are coming from. Then again, I have also been on the other side of the fence once or twice, having been blocked from implementing a particular useful new capability due to a certain anti-patching system on recent Windows versions that shall remain nameless, at least for this posting.

Additionally, in further defense of Microsoft’s position, the vast majority of code I have seen in the field that has performed some sort of kernel mode hooking has done some out of a lack of understanding of the available and supported kernel mode extensibility interfaces. There are, furthermore, a number of things where hooking system services won’t even get the desired result (even if it weren’t otherwise fraught with problems); for instance, attempting to catch file I/O by hooking NtReadFile/NtReadFileScatter/NtWriteFile/NtWriteFileGather will cause one to completely miss any I/O that is performed using a mapped section view.

Nonetheless, system service hooking seems to be an ever-increasing trend, one that doesn’t seem to be likely to go away any time soon (at least for 32-bit Windows). There are also, of course, bits of malicious code out there which attempt to do system service hooking as well, though in my experience, once again, the worst (and most common) offenders have been off-the-shelf software that was likely written by someone who didn’t really understand the consequences of what they were doing.

Rather than weigh in on the side of “Microsoft should allow all kernel mode hooking”, or “All code hooking is bad”, however, I thought that a different viewpoint might be worth considering on this subject. At the risk of incurring Don Burn’s wrath by even mentioning the subject (an occurrence that is quite common on the NTDEV list (if one happens to follow that), I figured that it might be instructive to provide an example of just how easy it is to slip up in a seemingly innocent way while writing a system service hook. Of course, “innocent slip-ups” in kernel mode often equate to bugchecks at their best, or security holes at their worst.

Thus, the following, an example of how a perfectly honest attempt at hooking a system service can go horribly wrong. This is not intended to be a tutorial in how to hook system services, but rather an illustration of how one can easily create all sorts of nasty bugs even while trying to be careful.

The following code assumes that the system service in question (NtCreateSection) has had a hook “safely” installed (BuggyDriverHookNtCreateSection). This posting doesn’t even touch on the many and varied problems of safely trying to hook (or even worse, unhook) a system service, which are also typically done wrong in my experience. Even discounting the diciness of those problems, there’s plenty that can go wrong here.

I’ll post a discussion of some of the worst bugs in this code later on, after people have had a chance to mull it over for a little while. This routine is a fairly standard attempt to post-process a system service by doing some additional work after it completes. Feel free to post a comment if you think you have found one of the problems (there are several). Bonus points if you can identify why something is broken and not just that it is broken (e.g. provide a scenario where the existing code breaks). Even more bonus points if you can explain how to fix the problem you have found without introducing yet another problem or otherwise making things worse – by asking that, I am more wanting to show that there are a great deal of subleties at play here, instead of simply showing how to operate an NtCreateSection hook correctly. Oh, and there’s certainly more than one bug here to be found, as well.

N.B. I haven’t run any of this through a compiler, so excuse any syntax errors that I missed.

Without further adeu, here’s the code:

//
// Note: This code has bugs. Please don't actually try this at home!
//

NTSTATUS
NTAPI
BuggyDriverNtCreateSection(
 OUT PHANDLE SectionHandle,
 IN ACCESS_MASK DesiredAccess,
 IN POBJECT_ATTRIBUTES ObjectAttributes,
 IN PLARGE_INTEGER SectionSize OPTIONAL,
 IN ULONG Protect,
 IN ULONG Attributes,
 IN HANDLE FileHandle
 )
{
 NTSTATUS Status;
 HANDLE Section;
 SECTION_IMAGE_INFORMATION ImageInfo;
 ULONG ReturnLength;
 PVOID SectionObject;
 PVOID FileObject;
 BOOLEAN SavedSectionObject;
 HANDLE SectionKernelHandle;
 LARGE_INTEGER LocalSectionSize;

 SavedSectionObject = FALSE;

 //
 // Let's call the original NtCreateSection, as we only care about successful
 // calls with valid parameters.
 //

 Status = RealNtCreateSection(
  SectionHandle,
  DesiredAccess,
  ObjectAttributes,
  SectionSize,
  Protect,
  Attributes,
  FileHandle
  );

 //
 // Failed? We'll bail out now.
 //

 if (!NT_SUCCESS( Status ))
  return Status;

 //
 // Okay, we've got a successful call made, let's do our work.
 // First, capture the returned section handle. Note that we do not need to
 // do a probe as that was already done by NtCreateSection, but we still do
 // need to use SEH.
 //

 __try
 {
  Section = *SectionHandle;
 }
 __except( EXCEPTION_EXECUTE_HANDLER )
 {
  Status = (NTSTATUS)GetExceptionCode();
 }

 //
 // The user unmapped our buffer, let's bail out.
 //

 if (!NT_SUCCESS( Status ))
  return Status;

 //
 // We need a pointer to the section object for our work. Let's grab it now.
 //

 Status = ObReferenceObjectByHandle(
  Section,
  0,
  NULL,
  KernelMode,
  &SectionObject,
  NULL
  );

 if (!NT_SUCCESS( Status ))
  return Status;

 //
 // Just for fun, let's check if the section was an image section, and if so,
 // we'll do special work there.
 //

 Status = ZwQuerySection(
  Section,
  SectionImageInformation,
  &ImageInfo,
  sizeof( SECTION_IMAGE_INFORMATION ),
  &ReturnLength
  );

 //
 // If we are an image section, then let's save away a pointer to to the
 // section object for our own use later.
 //

 
 if (NT_SUCCESS(Status))
 {
  //
  // Save pointer away for something that we might do with it later. We might
  // want to care about the section image information for some unspecified
  // reason, so we will copy that and save it in our tracking list. For
  // example, maybe we want to map a view of the section into the initial
  // system process from a worker thread.
  //

  Status = SaveImageSectionObjectInList(
   SectionObject,
   &ImageInfo
   );

  if (!NT_SUCCESS( Status ))
  {
   ObDereferenceObject( SectionObject );
   return Status;
  }

  SavedSectionObject = TRUE;
 }

 //
 // Let's also grab a kernel handle for the file object so that we can do some
 // sort of work with it later on.
 //

 Status = ObReferenceObjectByHandle(
  Section,
  0,
  NULL,
  KernelMode,
  &FileObject,
  NULL
  );

 if (!NT_SUCCESS( Status ))
 {
  if (SavedSectionObject)
   DeleteImageSectionObjectInList( SectionObject );

  ObDereferenceObject( SectionObject );

  return Status;
 }

 //
 // Save the file object away, as well as maximum size of the section object.
 // We need the size of the section object for a length check when accessing
 // the section later.
 //

 if (SectionSize)
 {
  __try
  {
   LocalSectionSize = *SectionSize;
  }
  __except( EXCEPTION_EXECUTE_HANDLER )
  {
   Status = (NTSTATUS)GetExceptionCode();

   ObDereferenceObject( FileObject );

   if (SavedSectionObject)
    DeleteImageSectionObjectInList( SectionObject );

   ObDereferenceObject( SectionObject );

   return Status;
  }
 }
 else
 {
  //
  // Ask the file object for it's length, this could be done by any of the
  // usual means to do that.
  //

  Status = QueryAllocationLengthFromFileObject(
   FileObject,
   &LocalSectionSize
   );

  if (!NT_SUCCESS( Status ))
  {
   ObDereferenceObject( FileObject );

   if (SavedSectionObject)
    DeleteImageSectionObjectInList( SectionObject );

   ObDereferenceObject( SectionObject );

   return Status;
  }
 }

 //
 // Save the file object + section object + section length away for future
 // reference.
 //

 Status = SaveSectionFileInfoInList(
  FileObject,
  SectionObject,
  &LocalSectionSize
  );

 if (!NT_SUCCESS( Status ))
 {
  ObDereferenceObject( FileObject );

  if (SavedSectionObject)
   DeleteImageSectionObjectInList( SectionObject );

  ObDereferenceObject( SectionObject );

  return Status;
 }

 //
 // All done. Lose our references now. Assume that the Save*InList routines
 // took their own references on the objects in question. Return to the caller
 // successfully.
 //

 ObDereferenceObject( FileObject );
 ObDereferenceObject( SectionObject );

 return STATUS_SUCCESS;
}

A catalog of NTDLL kernel mode to user mode callbacks, part 6: LdrInitializeThunk

Wednesday, November 28th, 2007

Previously, I described the mechanism by which the kernel mode to user mode callback dispatcher (KiUserCallbackDispatcher) operates, and how it is utilized by win32k.sys for various window manager related operations.

The next special NTDLL kernel mode to user mode “callback” up on the list is LdrInitializeThunk, which is not so much a callback as the entry point at which all user mode threads begin their execution system-wide. Although the Win32 CreateThread API (and even the NtCreateThread system service that is used to implement the Win32 CreateThread) provide the illusion that a thread begins its execution at a specified start routine (or instruction pointer, in the case of NtCreateThread), this is not truly the case.

CreateThread internally layers in a special kernel32 stub routine (BaseThreadStart or BaseProcessStart) in between the specified thread routine and the actual initial instruction of the thread. The kernel32 stub routine wrappers the call to the user-supplied thread start routine to provide services such a “top-level” SEH frame for the support of UnhandledExceptionFilter.

However, there exists yet another layer of indirection before a thread begins its execution at the specified thread start routine, even beyond the kernel32 stub routine at the start of all Win32 threads (the way the kernel32 stub routine works changes slightly with Windows Vista, though that is outside the scope of this discussion). The presence of this extra layer of indirection can be inferred by examining the documentation on MSDN for DllMain, which states that all threads call out to the DllMain routine at some point during thread start-up. The kernel32 stub routine is not involved in this process, and obviously the user-supplied thread entry point does not have to explicitly attempt to call DllMain for every loaded DLL with DLL_THREAD_ATTACH. This leaves us with the question of who actually arranges for these DllMain calls to happen when a thread begins.

The answer to this question is, of course, the feature routine of this article, LdrInitializeThunk. When a user mode thread is readied to begin initial execution after being created, the initial context that is realized is not the context value supplied to NtCreateThread (which would eventually end up in the user-supplied thread entry point). Instead, execution really begins at LdrInitializeThunk within NTDLL, which is supplied a CONTEXT record that describes the initially requested state of the thread (as supplied to NtCreateThread, or as created by NtCreateThreadEx on Windows Vista). This context record is provided as an argument to LdrInitializeThunk, in order to allow for control to (eventually) be transferred to the user-supplied thread entry point.

When invoked by a new thread, LdrInitializeThunk invokes LdrpInitialize to perform the remainder of the initialization tasks, and then calls upon the NtContinue system service to restore the supplied context record. I have made available a C-like representation of this process for illustration purposes.

LdrpInitialize makes a determination as to whether the process has already been initialized (for the purposes of NTDLL). This step is necessary as LdrInitializeThunk (and by extension, LdrpInitialize) is not only the actual entry point for a new thread in an already initialized process, but it is also the entry point for the initial thread in a completely new process (making it the first piece of code that is run in user mode in a new process). If the process has not already been initialized, then LdrpInitialize performs process initialization tasks by invoking LdrpInitializeProcess (some process initialization is performed inline by LdrpInitialize in the process initialization case as well). If the process is a Wow64 process, then the Wow64 NTDLL is loaded and invoked to initialize the 32-bit NTDLL’s per-process state.

Otherwise, if the process is already initialized, then LdrpInitialize invokes LdrpInitializeThread to perform per-thread initialization, which primarily involves invoking DllMain and TLS callbacks for loaded modules while the loader lock is held. (This is the reason why it is not supported to wait on a new thread to run its thread initialization routine from DllMain, because the new thread will immediately become blocked upon the loader lock, waiting for the thread already in DllMain to complete its processing.) If the process is a Wow64 process, then there is again support for making a call to the 32-bit NTDLL for purposes of running 32-bit per-thread initialization code.

When the required process or thread initialization tasks have been completed, LdrpInitialize returns to LdrInitializeThunk, which then realizes the user-supplied thread start context with a call to the NtContinue system service.

One consequence of this architecture for process and thread initialization is that it becomes (slightly) more difficult to step through process initialization, because the thread that initializes the process is not the first thread created in the process, but rather the first thread that executes LdrInitializeThunk. This means that one cannot simply create a process as suspended and attach a debugger to the process in order to step through process initialization, as the debugger break-in thread will run the very process initialization code that one wishes to step through before executing the break-in thread breakpoint instruction!

There is, fortunately, support for debugging this scenario built in to the Windows debugger package. By setting the debugger to break on the ‘create process’ event, it is possible to manually set a breakpoint on a new process (created by the debugger) or a child process (if child process debugging is enabled) before LdrInitializeThunk is run. This support is activated by breaking on the cpr (Create Process) event. For example, using the following ntsd command line, it is possible to examine the target and set breakpoints before any user mode code runs:

ntsd.exe -xe cpr C:\windows\system32\cmd.exe

Note that as the loaded module list will not have been initialized at this point, symbols will not be functional, so it is necessary to resolve the offset of LdrInitializeThunk manually in order to set a breakpoint. The process can be continued with the typical “g” command.

Update: Pavel Labedinsky points out that you can use “-xe ld:ntdll.dll” to achieve the same effect, but with the benefit that symbols for NTDLL are available. This is a better option as it alleviates the necessity to manually resolve the addresses of locations that you wish to set a breakpoint on..

Next up: Examining RtlUserThreadStart (present on Windows Vista and beyond).

A catalog of NTDLL kernel mode to user mode callbacks, part 5: KiUserCallbackDispatcher

Wednesday, November 21st, 2007

Last time, I briefly outlined the operation of KiRaiseUserExceptionDispatcher, and how it is used by the NtClose system service to report certain classes of handle misuse under the debugger.

All of the NTDLL kernel mode to user mode “callbacks” that I have covered thus far have been, for the most part fairly “passive” in nature. By this, I mean that the kernel does not explicitly call any of these callbacks, at least in the usual nature of making a function call. Instead, all of the routines that we have discussed thus far are only invoked instead of the normal return procedure for a system call or interrupt, under certain conditions. (Conceptually, this is similar in some respects to returning to a different location using longjmp.)

In contrast to the other routines that we have discussed thus far, KiUserCallbackDispatcher breaks completely out of the passive callback model. The user mode callback dispatcher is, as the name implies, a trampoline that is used to make full-fledged calls to user mode, from kernel mode. (It is complemented by the NtCallbackReturn system service, which resumes execution in kernel mode following a user mode callback’s completion. Note that this means that a user mode callback can make auxiliary calls into the kernel without “returning” back to the original kernel mode caller.)

Calling user mode to kernel mode is a very non-traditional approach in the Windows world, and for good reason. Such calls are typically dangerous and need to be implemented very carefully in order to avoid creating any number of system reliability or integrity issues. Beyond simply validating any data returned to kernel mode from user mode, there are a far greater number of concerns with a direct kernel mode to user mode call model as supported by KiUserCallbackDispatcher. For example, a thread running in user mode can be freely suspended, delayed for a very long period of time due to a high priority user thread, or even terminated. These actions mean that any code spanning a call out to user mode must not hold locks, have acquired memory or other resources that might need to be released, or soforth.

From a kernel mode perspective, the way a user mode callback using KiUserCallbackDispatcher works is that the kernel saves the current processor state on the current kernel stack, alters the view of the top of the current kernel stack to point after the saved register state, sets a field in the current thread (CallbackStack) to point to the stack frame containing the saved register state (the previous CallbackStack value is saved to allow for recursive callbacks), and then executes a return to user mode using the standard return mechanism.

The user mode return address is, of course, set to the feature NTDLL routine of this article, KiUserCallbackDispatcher. The way the user mode callback dispatcher operates is fairly simple. First, it indexes into an array stored in the PEB with an argument to the callback dispatcher that is used to select the function to be invoked. Then, the callback routine located in the array is invoked, and provided with a single pointer-sized argument from kernel mode (this argument is typically a structure pointer containing several parameters packaged up into one contiguous block of memory). The actual implementation of KiUserCallbackDispatcher is fairly simple, and I have posted a C representation of it.

In Win32, kernel mode to user mode callbacks are used exclusively by User32 for windowing related aspects, such as calling a window procedure to send a WM_NCCREATE message during the creation of a new window on behalf of a user mode caller that has invoked NtUserCreateWindowEx. For example, during window creation processing, if we set a breakpoint on KiUserCallbackDispatcher, we might see the following:

Breakpoint 1 hit
ntdll!KiUserCallbackDispatch:
00000000`77691ff7 488b4c2420  mov rcx,qword ptr [rsp+20h]
0:000> k
RetAddr           Call Site
00000000`775851ca ntdll!KiUserCallbackDispatch
00000000`7758514a USER32!ZwUserCreateWindowEx+0xa
00000000`775853f4 USER32!VerNtUserCreateWindowEx+0x27c
00000000`77585550 USER32!CreateWindowEx+0x3fe
000007fe`fddfa5b5 USER32!CreateWindowExW+0x70
000007fe`fde221d3 ole32!InitMainThreadWnd+0x65
000007fe`fde2150c ole32!wCoInitializeEx+0xfa
00000000`ff7e6db0 ole32!CoInitializeEx+0x18c
00000000`ff7ecf8b notepad!WinMain+0x5c
00000000`7746cdcd notepad!IsTextUTF8+0x24f
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`00000000 ntdll!RtlUserThreadStart+0x1d

If we step through this call a bit more, we’ll see that it eventually ends up in a function by the name of user32!_fnINLPCREATESTRUCT, which eventually calls user32!DispatchClientMessage with the WM_NCCREATE window message, allowing the window procedure of the new window to participate in the window creation process, despite the fact that win32k.sys handles the creation of a window in kernel mode.

Callbacks are, as previously mentioned, permitted to be nested (or even recursively made) as well. For example, after watching calls to KiUserCallbackDispatcher for a time, we’ll probably see something akin to the following:

Breakpoint 1 hit
ntdll!KiUserCallbackDispatch:
00000000`77691ff7 488b4c2420  mov rcx,qword ptr [rsp+20h]
0:000> k
RetAddr           Call Site
00000000`7758b45a ntdll!KiUserCallbackDispatch
00000000`7758b4a4 USER32!NtUserMessageCall+0xa
00000000`7758e55a USER32!RealDefWindowProcWorker+0xb1
000007fe`fca62118 USER32!RealDefWindowProcW+0x5a
000007fe`fca61fa1 uxtheme!_ThemeDefWindowProc+0x298
00000000`7758b992 uxtheme!ThemeDefWindowProcW+0x11
00000000`ff7e69ef USER32!DefWindowProcW+0xe6
00000000`7758e25a notepad!NPWndProc+0x217
00000000`7758cbaf USER32!UserCallWinProcCheckWow+0x1ad
00000000`77584e1c USER32!DispatchClientMessage+0xc3
00000000`77692016 USER32!_fnINOUTNCCALCSIZE+0x3c
00000000`775851ca ntdll!KiUserCallbackDispatcherContinue
00000000`7758514a USER32!ZwUserCreateWindowEx+0xa
00000000`775853f4 USER32!VerNtUserCreateWindowEx+0x27c
00000000`77585550 USER32!CreateWindowEx+0x3fe
00000000`ff7e9525 USER32!CreateWindowExW+0x70
00000000`ff7e6e12 notepad!NPInit+0x1f9
00000000`ff7ecf8b notepad!WinMain+0xbe
00000000`7746cdcd notepad!IsTextUTF8+0x24f
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd

This support for recursive callbacks is a large factor in why threads that talk to win32k.sys often have so-called “large kernel stacks”. The kernel mode dispatcher for user mode calls will attempt to convert the thread to a large kernel stack when a call is made, as the typical sized kernel stack is not large enough to support the number of recursive kernel mode to user mode calls present in a many complicated window messaging calls.

If the process is a Wow64 process, then the callback array in the PEB is prepointed to an array of conversion functions inside the Wow64 layer, which map the callback argument to a version compatible with the 32-bit user32.dll, as appropriate.

Next up: Taking a look at LdrInitializeThunk, where all user mode threads really begin their execution.

A catalog of NTDLL kernel mode to user mode callbacks, part 4: KiRaiseUserExceptionDispatcher

Tuesday, November 20th, 2007

The previous post in this series outlined how KiUserApcDispatcher operates for the purposes of enabling user mode APCs. Unlike KiUserExceptionDispatcher, which is expected to modify the return information from the context of an interrupt (or exception) in kernel mode, KiUserApcDispatcher is intended to operate on the return context of an active system call.

In this vein, the third kernel mode to user mode NTDLL “callback” that we shall be investigating, KiRaiseUserExceptionDispatcher, is fairly similar. Although in some respects akin to KiUserExceptionDispatcher, at least relating to raising an exception in user mode on the behalf of a kernel mode caller, the KiRaiseUserExceptionDispatcher “callback” really has more in common with KiUserApcDispatcher from a usage and implementation standpoint. It also so happens that the implementation of this callback routine, which is quite simple, is completely representable in C, as a standard calling convention is used.

KiRaiseUserExceptionDispatcher is used if a system call wishes to raise an exception in user mode instead of simply return an NTSTATUS, as is the standard convention. It simply constructs a standard exception record using a supplied status code (which must be written to a well-known location in the current thread’s TEB beforehand), and passes the exception to RtlRaiseException (the same routine that is used internally by the Win32 RaiseException API).

This is a fairly atypical scenario, as most errors (including bad pointer parameters) will simply result in an appropriate status code being returned by the system call, such as STATUS_ACCESS_VIOLATION.

The one place the does currently use the services of KiRaiseUserExceptionDispatcher is NtClose, the system service responsible for closing handles (which implements the Win32 CloseHandle API). When a debugger is attached to a process and a protected handle (as set by a call to SetHandleInformation with HANDLE_FLAG_PROTECT_FROM_CLOSE) is passed to NtClose, then a STATUS_HANDLE_NOT_CLOSABLE exception is raised to user mode via KiRaiseUserExceptionDispatcher. For example, one might see the following when trying to close a protected handle with a debugger attached:

(2600.1994): Unknown exception - code c0000235 (first chance)
(2600.1994): Unknown exception - code c0000235 (second chance)
ntdll!KiRaiseUserExceptionDispatcher+0x3a:
00000000`776920e7 8b8424c0000000  mov eax,dword ptr [rsp+0C0h]
0:000> !error c0000235
Error code: (NTSTATUS) 0xc0000235 (3221226037) - NtClose was
called on a handle that was protected from close via
NtSetInformationObject.
0:000> k
RetAddr           Call Site
00000000`7746dadd ntdll!KiRaiseUserExceptionDispatcher+0x3a
00000000`01001955 kernel32!CloseHandle+0x29
00000000`01001e60 TestApp!wmain+0x35
00000000`7746cdcd TestApp!__tmainCRTStartup+0x120
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`00000000 ntdll!RtlUserThreadStart+0x1d

The exception can be manually continued in a debugger to allow the program to operate after this occurs, though such a bad handle reference is typically indicative of a serious bug.

A similar behavior is activated if a program is being debugged under Application Verifier and an invalid handle is closed. This behavior of raising an exception on a bad handle closure attempt is intended as a debugging aid, since most programs do not bother to check the return value of CloseHandle.

Incidentally, bad kernel handle closure references like this in kernel mode will result in a bugcheck instead of simply raising an exception that can be caught. Drivers do not have the “luxury” of continuing when they touch a bad handle, except when probing a user handle, of course. (Old versions of Process Explorer used to bring down the box with a bugcheck due to this, for instance, if you tried to close a protected handle. This bug has fortunately since been fixed.)

Next time, we’ll take a look at KiUserCallbackDispatcher, which is used to a great degree by win32k.sys.

A catalog of NTDLL kernel mode to user mode callbacks, part 3: KiUserApcDispatcher

Monday, November 19th, 2007

I previously described the behavior of the kernel mode to user mode exception dispatcher (KiUserExceptionDispatcher). While exceptions are arguably the most commonly seen of the kernel mode to user mode “callbacks” in NTDLL, they are not the only such event.

Much like exceptions that traverse into user mode, user mode APCs are all (with the exception of initial thread startup) funneled through a single dispatcher routine inside NTDLL, KiUserApcDispatcher. KiUserApcDispatcher is responsible for invoking the actual APC routine, re-testing for APC delivery (to allow for the entire APC queue to be drained at once), and returning to the return point of the alertable system call that was interrupted to deliver the user mode APC. From the perspective of the user mode APC dispatcher, the stack is logically arranged as if the APC dispatcher would “return” to the instruction immediately following the syscall instruction in the system call that began the alertable wait.

For example, if we breakpoint on KiUserApcDispatcher and examine the stack (in this case, after an alertable WaitForSingleObjectEx call that has been interrupted to deliver a user mode APC), we might see the following:

Breakpoint 0 hit
ntdll!KiUserApcDispatcher:
00000000`77691f40 488b0c24  mov     rcx,qword ptr [rsp]
0:000> k
RetAddr           Call Site
00000000`776902ba ntdll!KiUserApcDispatcher
00000000`7746d820 ntdll!NtWaitForSingleObject+0xa
00000000`01001994 kernel32!WaitForSingleObjectEx+0x9c
00000000`01001eb0 TestApp!wmain+0x64
00000000`7746cdcd TestApp!__tmainCRTStartup+0x120
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`00000000 ntdll!RtlUserThreadStart+0x1d

From an implementation standpoint, KiUserApcDispatcher is conceptually fairly simple. I’ve posted a translation of the assembler for those interested. Keep in mind, however, that as with the other kernel mode to user mode callbacks, the routine is actually written in assembler and utilizes constructs not expressible through C, such as a custom calling convention.

Note that the APC routine invoked by KiUserApcDispatcher corresponds roughly to a standard native APC routine, specifically a PKNORMAL_ROUTINE (except on x64, but more on that later). This is not compatible with the Win32 view of an APC routine, which takes a single parameter, as opposed to be three that a KNORMAL_ROUTINE takes. As a result, there is a kernel32 function, BaseDispatchAPC that wrappers all Win32 APCs, providing an SEH frame around the call (and activating the appropriate activation context, if necessary). BaseDispatchAPC also converts from the native APC calling convention into the Win32 APC calling convention.

For Win32 I/O completion routines, a similar wrapper routine (BasepIoCompletion) serves the purpose of converting from the standard NT APC calling convention to the Win32 I/O completion callback calling convention (which primarily includes unpackaging any I/O completion parameters from the IO_STATUS_BLOCK).

With Windows x64, the behavior of KiUserApcDispatcher changes slightly. Specifically, the APC routine invoked has four parameters instead of the standard set of three parameters for a PKNORMAL_ROUTINE. This is still compatible with standard NT APC routines due to a quirk of the x64 calling convention, whereby the first four arguments are passed by register. (This means that internally, any routines with zero through four arguments that fit within the confines of a standard pointer-sized argument slot are “compatible”, at least from a calling convention perspective.)

The fourth parameter added by KiUserApcDispatcher in the x64 case is a pointer to the context record that is to be resumed when the APC dispatching process is finished. This additional argument is used by Wow64.dll if the process is a Wow64 process, as Wow64.dll wrappers all 32-bit APC routines around a thunk routine (Wow64ApcRoutine). Wow64ApcRoutine internally uses this extra PCONTEXT argument to take control of resuming execution after the real APC routine is invoked. Thus in the 64-bit NTDLL Wow64 case, the NtContinue call following the call to the user APC routine never occurs.

Next time, we’ll take a look at the other kernel mode to user mode exception “callback”, KiRaiseUserExceptionDispatcher.

A catalog of NTDLL kernel mode to user mode callbacks, part 2: KiUserExceptionDispatcher

Friday, November 16th, 2007

Yesterday, I listed the set of kernel mode to user mode callback entrypoints (as of Windows Server 2008). Although some of the callbacks share certain similarities in their modes of operation, there remain significant differences between each of them, in terms of both calling convention and what functionality they perform.

KiUserExceptionDispatcher is the routine responsible for calling the user mode portion of the SEH dispatcher. When an exception occurs, and it is an exception that would generate an SEH event, the kernel checks to see whether the exception occurred while running user mode code. If so, then the kernel alters the trap frame on the stack, such that when the kernel returns from the interrupt or exception, execution resumes at KiUserExceptionDispatcher instead of the instruction that raised the fault. The kernel also arranges for several parameters (a PCONTEXT and a PEXCEPTION_RECORD) that describe the state of the machine when the exception occurred to be passed to KiUserExceptionDispatcher upon the return to user mode. (This model of changing the return address for a return from kernel mode to user mode is a common idiom in the Windows kernel for several user mode event notification mechanisms.)

Once the kernel mode stack unwinds and control is transferred to KiUserExceptionDispatcher in user mode, the exception is processed locally via a call to RtlDispatchException, which is the core of the user mode exception dispatcher logic. If the exception was successfully dispatched (that is, an exception handler handled it), the final user mode context is realized with a call to RtlRestoreContext, which simply loads the registers in the given context into the processor’s architectural execution state.

Otherwise, the exception is “re-thrown” to kernel mode for last chance processing via a call to NtRaiseException. This gives the user mode debugger (if any) a final shot at handling the exception, before the kernel terminates the process. (The kernel internally provides the user mode debugger and kernel debugger a first chance shot at such exceptions before arranging for KiUserExceptionDispatcher to be run.)

I have posted a pseudo-C representation of KiUserExceptionDispatcher for the curious. Note that it uses a specialized custom calling convention, and by virtue of this, is actually an assembler function. However, a C representation is often easier to understand.

Not all user mode exceptions originate from kernel mode in this fashion; in many cases (such as with the RaiseException API), the exception dispatching process is originated entirely from user mode and KiUserExceptionDispatcher is not involved.

Incidentally, if one has been following along with some of the recent postings, the reason why the invalid parameter reporting mechanism of the Visual Studio 2005 CRT doesn’t properly break into into an already attached debugger should start to become clear now, given some additional knowledge of how the exception dispatching process works.

Because the VS2005 CRT simulates an exception by building a context and exception record, and passing these to UnhandledExceptionFilter, the normal exception dispatcher logic is not involved. This means that nobody makes a call to the NtRaiseException system service (as would normally be the case if UnhandledExceptionFilter were called as a part of normal exception dispatching), and thus there is no notification sent to the user mode debugger asking it to pre-process the simulated STATUS_INVALID_PARAMETER exception.

Update: Posted representation of KiUserExceptionDispatcher.

Next up: Taking a look at the user mode APC dispatcher (KiUserApcDispatcher).

A catalog of NTDLL kernel mode to user mode callbacks, part 1: Overview

Thursday, November 15th, 2007

As I previously mentioned, NTDLL maintains a set of special entrypoints that are used by the kernel to invoke certain functionality on the behalf of user mode.

In general, the functionality offered by these entrypoints is fairly simple, although having an understanding of how each are used provides useful insight into how certain features (such as user mode APCs) really work “under the hood”.

For the purposes of this discussion, the following are the NTDLL exported entrypoints that the kernel uses to communicate to user mode:

  1. KiUserExceptionDispatcher
  2. KiUserApcDispatcher
  3. KiRaiseUserExceptionDispatcher
  4. KiUserCallbackDispatcher
  5. LdrInitializeThunk
  6. RtlUserThreadStart
  7. EtwpNotificationThread
  8. LdrHotPatchRoutine

(There are other NTDLL exports used by the kernel, but not for direct user mode communication.)

These routines are generally used to inform user mode of a particular event occurring, though the specifics of how each routine is called vary somewhat.

KiUserExceptionDispatcher, KiUserApcDispatcher, and KiRaiseUserExceptionDispatcher are exclusively used when user mode has entered kernel mode, either implicitly, due to a processor interrupt (say, a page fault that will eventually trigger an access violation), or explicitly, due to a system call (such as NtWaitForSingleObject). The mechanism that the kernel uses to invoke these entrypoints is to alter the context that will be realized upon return from kernel mode to user mode. The return to user mode context information (a KTRAP_FRAME) is modified such that when kernel mode returns, instead of returning to the point upon which user mode invoked a kernel mode transition, control is transferred to one of the three dispatcher routines. Additional arguments are supplied to these dispatcher routines as necessary.

KiUserCallbackDispatcher is used to explicitly call out to user mode from kernel mode. This inverted mode of operation is typically discouraged in favor of models such as the pending IRP “inverted call model”. For historical design reasons, however, the Win32 subsystem (win32k.sys) uses this for a number of tasks (such as calling a user mode window procedure in response to a kernel mode window message operation). The user mode callout mechanism is not extensible to support arbitrary user mode callback destinations.

LdrInitializeThunk is the first instruction that any user mode thread runs, before the “actual” thread entrypoint. As such, it is the address at which every user mode thread system-wide begins execution.

RtlUserThreadStart is used on Windows Vista and Windows Server 2008 (and later OS’s) to form the initial entrypoint context for a thread started with NtCreateThreadEx (this is a marked departure from the approach taken by NtCreateThread, wherein the user mode caller supplies the initial thread context).

EtwpNotificationThread and LdrHotPatchRoutine both correspond to a standard thread entrypoint routine. These entrypoints are referenced by user mode threads that are created in the context of a particular process to carry out certain tasks on behalf of the kernel. As the latter two routines are generally only rarely encountered, this series does not describe them in detail.

Despite (or perhaps in spite of) the more or less completely unrealized promise of less reboots for hotfixes with Windows Server 2003, I think I’ve seen a grand total of one or two hotfixes in the entire lifetime of the OS that supported hotpatching. Knowing the effort that must have gone into developing hotpatching support, it is depressing to see security bulletin after security bulletin state “No, this hotfix does not support hotpatching. A reboot will be required.”. That is, however, a topic for another day.

Next up: Examining the operation of KiUserExceptionDispatcher in more detail.