A catalog of NTDLL kernel mode to user mode callbacks, part 5: KiUserCallbackDispatcher

Last time, I briefly outlined the operation of KiRaiseUserExceptionDispatcher, and how it is used by the NtClose system service to report certain classes of handle misuse under the debugger.

All of the NTDLL kernel mode to user mode “callbacks” that I have covered thus far have been, for the most part fairly “passive” in nature. By this, I mean that the kernel does not explicitly call any of these callbacks, at least in the usual nature of making a function call. Instead, all of the routines that we have discussed thus far are only invoked instead of the normal return procedure for a system call or interrupt, under certain conditions. (Conceptually, this is similar in some respects to returning to a different location using longjmp.)

In contrast to the other routines that we have discussed thus far, KiUserCallbackDispatcher breaks completely out of the passive callback model. The user mode callback dispatcher is, as the name implies, a trampoline that is used to make full-fledged calls to user mode, from kernel mode. (It is complemented by the NtCallbackReturn system service, which resumes execution in kernel mode following a user mode callback’s completion. Note that this means that a user mode callback can make auxiliary calls into the kernel without “returning” back to the original kernel mode caller.)

Calling user mode to kernel mode is a very non-traditional approach in the Windows world, and for good reason. Such calls are typically dangerous and need to be implemented very carefully in order to avoid creating any number of system reliability or integrity issues. Beyond simply validating any data returned to kernel mode from user mode, there are a far greater number of concerns with a direct kernel mode to user mode call model as supported by KiUserCallbackDispatcher. For example, a thread running in user mode can be freely suspended, delayed for a very long period of time due to a high priority user thread, or even terminated. These actions mean that any code spanning a call out to user mode must not hold locks, have acquired memory or other resources that might need to be released, or soforth.

From a kernel mode perspective, the way a user mode callback using KiUserCallbackDispatcher works is that the kernel saves the current processor state on the current kernel stack, alters the view of the top of the current kernel stack to point after the saved register state, sets a field in the current thread (CallbackStack) to point to the stack frame containing the saved register state (the previous CallbackStack value is saved to allow for recursive callbacks), and then executes a return to user mode using the standard return mechanism.

The user mode return address is, of course, set to the feature NTDLL routine of this article, KiUserCallbackDispatcher. The way the user mode callback dispatcher operates is fairly simple. First, it indexes into an array stored in the PEB with an argument to the callback dispatcher that is used to select the function to be invoked. Then, the callback routine located in the array is invoked, and provided with a single pointer-sized argument from kernel mode (this argument is typically a structure pointer containing several parameters packaged up into one contiguous block of memory). The actual implementation of KiUserCallbackDispatcher is fairly simple, and I have posted a C representation of it.

In Win32, kernel mode to user mode callbacks are used exclusively by User32 for windowing related aspects, such as calling a window procedure to send a WM_NCCREATE message during the creation of a new window on behalf of a user mode caller that has invoked NtUserCreateWindowEx. For example, during window creation processing, if we set a breakpoint on KiUserCallbackDispatcher, we might see the following:

Breakpoint 1 hit
00000000`77691ff7 488b4c2420  mov rcx,qword ptr [rsp+20h]
0:000> k
RetAddr           Call Site
00000000`775851ca ntdll!KiUserCallbackDispatch
00000000`7758514a USER32!ZwUserCreateWindowEx+0xa
00000000`775853f4 USER32!VerNtUserCreateWindowEx+0x27c
00000000`77585550 USER32!CreateWindowEx+0x3fe
000007fe`fddfa5b5 USER32!CreateWindowExW+0x70
000007fe`fde221d3 ole32!InitMainThreadWnd+0x65
000007fe`fde2150c ole32!wCoInitializeEx+0xfa
00000000`ff7e6db0 ole32!CoInitializeEx+0x18c
00000000`ff7ecf8b notepad!WinMain+0x5c
00000000`7746cdcd notepad!IsTextUTF8+0x24f
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`00000000 ntdll!RtlUserThreadStart+0x1d

If we step through this call a bit more, we’ll see that it eventually ends up in a function by the name of user32!_fnINLPCREATESTRUCT, which eventually calls user32!DispatchClientMessage with the WM_NCCREATE window message, allowing the window procedure of the new window to participate in the window creation process, despite the fact that win32k.sys handles the creation of a window in kernel mode.

Callbacks are, as previously mentioned, permitted to be nested (or even recursively made) as well. For example, after watching calls to KiUserCallbackDispatcher for a time, we’ll probably see something akin to the following:

Breakpoint 1 hit
00000000`77691ff7 488b4c2420  mov rcx,qword ptr [rsp+20h]
0:000> k
RetAddr           Call Site
00000000`7758b45a ntdll!KiUserCallbackDispatch
00000000`7758b4a4 USER32!NtUserMessageCall+0xa
00000000`7758e55a USER32!RealDefWindowProcWorker+0xb1
000007fe`fca62118 USER32!RealDefWindowProcW+0x5a
000007fe`fca61fa1 uxtheme!_ThemeDefWindowProc+0x298
00000000`7758b992 uxtheme!ThemeDefWindowProcW+0x11
00000000`ff7e69ef USER32!DefWindowProcW+0xe6
00000000`7758e25a notepad!NPWndProc+0x217
00000000`7758cbaf USER32!UserCallWinProcCheckWow+0x1ad
00000000`77584e1c USER32!DispatchClientMessage+0xc3
00000000`77692016 USER32!_fnINOUTNCCALCSIZE+0x3c
00000000`775851ca ntdll!KiUserCallbackDispatcherContinue
00000000`7758514a USER32!ZwUserCreateWindowEx+0xa
00000000`775853f4 USER32!VerNtUserCreateWindowEx+0x27c
00000000`77585550 USER32!CreateWindowEx+0x3fe
00000000`ff7e9525 USER32!CreateWindowExW+0x70
00000000`ff7e6e12 notepad!NPInit+0x1f9
00000000`ff7ecf8b notepad!WinMain+0xbe
00000000`7746cdcd notepad!IsTextUTF8+0x24f
00000000`7768c6e1 kernel32!BaseThreadInitThunk+0xd

This support for recursive callbacks is a large factor in why threads that talk to win32k.sys often have so-called “large kernel stacks”. The kernel mode dispatcher for user mode calls will attempt to convert the thread to a large kernel stack when a call is made, as the typical sized kernel stack is not large enough to support the number of recursive kernel mode to user mode calls present in a many complicated window messaging calls.

If the process is a Wow64 process, then the callback array in the PEB is prepointed to an array of conversion functions inside the Wow64 layer, which map the callback argument to a version compatible with the 32-bit user32.dll, as appropriate.

Next up: Taking a look at LdrInitializeThunk, where all user mode threads really begin their execution.

Tags: , ,

6 Responses to “A catalog of NTDLL kernel mode to user mode callbacks, part 5: KiUserCallbackDispatcher”

  1. Koby Kahane says:


    although you’ve outlined the pitfalls of kernel-to-user callbacks in NT, it still seems curious to me that a facility used extensively by the Window Manager is not documented as user/kernel mode communication facility for use by third party drivers. I suspect that for many situations, this approach may be more natural and less cumbersome than keeping a pending IRP to a user-mode client, etc.

  2. Skywing says:

    As designed now, the interface is not extensible by third parties. There is one callback array per process, and thus there isn’t really viable support for plugging in third party functions on the fly.

    I think the main reason why the interface is not encouraged is that it’s just too easy to create serious problems and too hard to get it right from a kernel mode perspective. There are so many subtle rules with things you need to watch out for so as to ensure that a bad user mode callee can’t break the system, and the benefit of this approach in the end is dubious, I think, once you take the time and care to handle all of these nuances.

    Furthermore, the kernel mode to user mode call interface is really not as convenient as you might imagine for things like asynchronous notifications to user mode clients. A thread has to already be in kernel mode running your kernel mode code in order for the kernel to user call mechanism to work, which means that it’s not really well suited as an asynchronous notification mechanism (user mode callers would need to be blocked in kernel mode for a kernel mode caller to be able to call user mode on that thread).

    At that point, if you are going to have to be blocked on kernel mode, you might as well just use the recommended mechanisms for communication, such as a pended IRP (these mechanisms also integrate much better into a high performance work item based architecture).

    I think the only reason that Microsoft went this route with win32k is that the concept of constant calls back to user mode is just too hardwired into the windowing architecture that arose in 16-bit Windows for it to be practical (or performant) to break many system calls like NtUserCreateWindowEx into discrete pieces that operate entirely in kernel mode. The windowing architecture is really just designed to run in process with the user supplied window procedure, and shifting large portions of the windowing system to kernel mode becomes a sticky thing given this design aspect.

  3. […] Nynaeve Adventures in Windows debugging and reverse engineering. « A catalog of NTDLL kernel mode to user mode callbacks, part 5: KiUserCallbackDispatcher […]

  4. arrowgans says:

    For a project I’m currently working on, in which I’m implementing an x86 interpreter that is injected into a process, suspends the current thread and spawn’s it’s own thread that uses the suspended thread’s context to somewhat emulate it, has some problems with this kernel mode to user mode callbacks. My x86 interpreter executes all the instructions of the suspended thread, using it’s context, and emulates instructions that change the flow of control, like calls, jmp,loops etc. When a sample program, that is interpreted, does a systemcall and returns it, my spawned thread get’s the control back, but I the case of the kernel to user callback, my thread is stuck in waiting and the app is deadlocked.

    Is it feasible to hook the KernelCallBackTable of that example process, and if it is, how can I calculate it’s size?

    Or do you have some tips to get arround the deadlock without hooking the table?

    greetings ArrowGans

  5. […] Discussion on the user-mode callback mechanism by Ken Johnson (SkyWing): http://www.nynaeve.net/?p=204 […]