Archive for March, 2009

Debugger tricks: Find all probable CONTEXT records in a crash dump

Monday, March 30th, 2009

If you’ve debugged crash dumps for awhile, then you’ve probably ran into a situation where the initial dump context provided by the debugger corresponds to a secondary exception that happened while processing an initial exception that’s likely closer to the original underlying problem in the issue you’re investigating.

This can be annoying, as the “.ecxr” command will point you at the site of the secondary failure exception, and not the original exception context itself. However, in most cases the original, primary exception context is still there on the stack; one just needs to know how to find it.

There are a couple of ways to go about this:

  • For hardware generated exceptions (such as access violations), one can look for ntdll!KiUserExceptionDispatcher on the stack, which takes a PCONTEXT and an PEXCEPTION_RECORD as arguments.
  • For software-generated exceptions (such as C++ exceptions), things get a bit dicier. One can look for ntdll!RtlDispatchException being called on the stack, and from there, grab the PCONTEXT parameter.

This can be a bit tedious if stack unwinding fails, or you’re dealing with one of those dumps where exceptions on multiple threads at the same time, typically due to crash dump writing going haywire (I’m looking at you, Outlook…). It would be nice if the debugger could automate this process a little bit.

Fortunately, it’s actually not hard to do this with a little bit of a brute-force approach. Specifically, just a plain old “dumb” memory scan for something common to most all CONTEXT records. It’s not exactly a finesse approach, but it’s usually a lot faster than manually groveling through the stack, especially if multiple threads or multiple nested exceptions are involved. While there may be false-positives, it’s usually immediately obvious as to what makes sense to be involved with a live exception or not. Sometimes, however, quick-and-dirty brute force type solutions really end up doing the trick, though.

In order to find CONTEXT records based on a memory search, though, we need some common data points that are typically the same for all CONTEXT structures, and, preferably, contiguous (for ease of use with the “s” command, the debugger’s memory search support). Fortunately, it turns out that this exists in the form of the segment registers of a CONTEXT structure:

0:000> dt ntdll!_CONTEXT
+0x000 ContextFlags : Uint4B
[…]

+0x08c SegGs : Uint4B
+0x090 SegFs : Uint4B
+0x094 SegEs : Uint4B
+0x098 SegDs : Uint4B

[…]

Now, it turns out that for all threads in a given process will almost always have the same segment selector values, excluding exotic and highly out of the ordinary cases like VDM processes. (The same goes for the segment selector values on x64 as well.) Four non-zero 32-bit values (actually, 16-bit values with zero padding to 32-bits) are enough to be able to reasonably pull a search off without being buried in false positives. Here’s how to do it with the infamous WinDbg debugger script (also applicable to other DbgEng-enabled programs, such as kd):

.foreach ( CxrPtr { s -[w1]d 0 l?ffffffff @gs @fs @es @ds } ) { .cxr CxrPtr – 8c }

This is a bit of a long-winded command, so let’s break it down into the individual components. First, we have a “.foreach” construct, which according to the debugger documentation, follows this convention:

.foreach [Options] ( Variable { InCommands } ) { OutCommands }

The .foreach command (actually one of the more versitle debugger-scripting commands, once one gets used to using it) basically takes a series of input strings generated by an input command (InCommands) and invokes an command to process that output (OutCommands), with the results of the input command being subsituted in as a macro specified by the Variable argument. It’s ugly and operates based on text parsing (and there’s support for skipping every X inputs, among other things; see the debugger documentation), but it gets the job done.

The next part of this operation is the s command, which instructs the debugger to search for a pattern in-memory in the target. The arguments supplied here instruct the debugger to search only writable memory (w), output only the address of each match (1), scan for DWORD (4-byte) sized quanitites (d) in the lower 4GB of the address space (0 l?ffffffff); in this case, we’re assuming that the target is a 32-bit process (which might be hosted on Wow64, hence 4GB instead of 3GB used). The remainder of the command specifies the search pattern to look for; the segment register values of the current thread. The “s” command sports a plethora of other options (with a rather unwieldy and convoluted syntax, unfortunately); the debugger documentation runs through the gamut of the other capabilities.

The final part of this command string is the output command, which simply directs the debugger to set the current context to the input command output replacement macro’s value at an offset of 0x8c. (If one recalls, 0x8c is the offset from the start of a struct _CONTEXT to the SegGs member, which is the first value we search for; as a result, the addresses returned by the “s” command will be the address of the SegGs member.) Remember that we restricted the output of the “s” command to just being the address itself, which lets us easily pass that on to a different command (which might give one the idea that the “s” and “.foreach” commands were made to work together).

Putting the command string together, it directs the debugger to search for a sequence of four 32-bit values (the gs, fs, es, and ds segment selector values for the current thread) in contiguous memory, and display the containing CONTEXT structure for each match.

You may find some other CONTEXT records aside from exception-related ones while executing this comamnd (in particular, initial thread contexts are common), but the ones related to a fault are usually pretty obvious and self-evident. Of course, this method isn’t foolproof, but it lets the debugger do some of the hard work for you (which beats manually groveling in corrupted stacks across multiple threads just to pull out some CONTEXT records).

Naturally, there are a number of other uses for both the “.foreach” and “s” commands; don’t be afraid to experiment with them. There are other helpers for automating certain tasks (!for_each_frame, !for_each_local, !for_each_module, !for_each_process, and !for_each_thread, to name a few) too, aside from the general “.foreach“. The debugger scripting support might not be the prettiest to look at, but it can be quite handy at speeding up common, repetitive tasks.

One parting tip with “.foreach” (well, actually two parting tips): The variable replacement macro only works if you separate it from other symbols with a space. This can be a problem in some cases (where you need to perform some arithmetic on the resulting expanded macro in particular, such as subtracting 0x8c in this case), however, as the spaces remain when the macro symbol is expanded. Some commands, such as “dt“, don’t follow the standard expression parsing rules (much to my eternal irritation), and choke if they’ve given arguments with spaces.

All is not lost for these commands, however; one way to work around this issue is to store the macro replacement into a pseudo-register (say, “r @$t0 = ReplacementVariableMacro – 0x8c“) and use that pseudo-register in the actual output command, as you can issue multiple, semi-colon delimited commands in the output commands section.

Examining kernel stacks on Vista/Srv08 using kdbgctrl -td, even when you haven’t booted /DEBUG

Friday, March 13th, 2009

Starting with Vista/Srv08, local kernel debugging support has been locked down to require that the system was booted /DEBUG before-hand.

For me, this has been the source of great annoyance, as if something strange happens to a Vista/Srv08 box that requires peering into kernel mode (usually to get a stack trace), then one tends to hit a brick wall if the box wasn’t booted with /DEBUG. (The only supported option usually available would be manually bugcheck the box and hope that the crash dump, if the system was configured to write one, has useful data. This is, of course, not a particularly great option; especially as in the default configuration for dumps on Vista/Srv08, it’s likely that user mode memory won’t be captured.)

However, it turns out that there’s limited support in the operating system to capture some kernel mode data for debugging purposes, even if you haven’t booted /DEBUG. Specifically, kdbgctrl.exe (a tool shipping with the WinDbg distribution) supports the notion of capturing a “kernel triage dump”, which basically takes a miniature snapshot of a given process and its active user mode threads. To use this feature, you need to have SeDebugPrivilege (i.e. you need to be an administrative-equivalently-privileged user).

Unlike conventional local KD support, triage dump writing doesn’t give unrestricted access to kernel mode memory on a live system. Instead, it instructs the kernel to create a small dump file that contains a limited, pre-set amount of information (mostly, just the process and associated threads, and if possible, the stacks of said threads). As a result, you can’t use it for general kernel memory examination. However, it’s sometimes enough to do the trick if you just need to capture what the kernel mode side stack trace of a particular thread is.

To use this feature, you can invoke “kdbgctrl.exe -td pid dump-filename“, which takes a snapshot of the process identified by a pid, and writes it out to a dump file named by dump-filename. This support is very tersely documented if you invoke kdbgctrl.exe on the command line with no arguments:

kdbgctrl
Usage: kdbgctrl <options>
Options:
[...]
-td <pid> <file> - Get a kernel triage dump

Now, the kernel’s support for writing out a triage dump isn’t by any means directly comparable to the power afforded by kd.exe -kl. As previously mentioned, the dump is extremely minimalistic, only containing information about a process object and its associated threads and the bare minimums allowing the debugger to function enough to pull this information from the dump. Just about all that you can reasonably expect to do with it is examine the stacks of the process that was captured, via the “!process -1 1f” command. (In addition, some extra information, such as the set of pending APCs attached to each thread, is also saved – enough to allow “!apc thread” to function.) Sometimes, however, it’s just enough to be able to figure out what something is blocking on kernel mode side (especially when troubleshooting deadlocks), and the triage dump functionality can aid with that.

However, there are some limitations with triage dumps, even above the relatively terse amount of data they convey. The triage dump support needs to be careful not to crash the system when writing out the dump, so all of this information is captured within the normal confines of safe kernel mode operation. In particular, this means that the triage dump writing logic doesn’t just blithely walk the thread or process lists like kd -kl would, or perform other potentially unsafe operations.

Instead, the triage dump creation process operates with the cooperation of all threads involved. This is immediately apparent when taking a look at the thread stacks captured in a triage dump:

BugCheck 69696969, {0, 0, 0, 0}

Probably caused by : ntkrnlmp.exe ( nt!IopKernSnapAPCMiniDump+55 )

Followup: MachineOwner
---------

2: kd> k
Call Site
nt!IopKernSnapAPCMiniDump+0x55
nt!IopKernSnapSpecialApc+0x50
nt!KiDeliverApc+0x1e2
nt!KiSwapThread+0x491
nt!KeWaitForSingleObject+0x2da
nt!KiSuspendThread+0x29
nt!KiDeliverApc+0x420
nt!KiSwapThread+0x491
nt!KeWaitForMultipleObjects+0x2d6
nt!ObpWaitForMultipleObjects+0x26e
nt!NtWaitForMultipleObjects+0xe2
nt!KiSystemServiceCopyEnd+0x13
0x7741602a

(Note that the system doesn’t actually bugcheck as a result of creating a triage dump. Kernel dumps implicitly need a bug check parameters record, however, so one is synthesized for the dump file.)

As is immediately obvious from the call stack, thread stack collection in triage dumps operates (in Vista/Srv08) by tripping a kernel APC on the target thread, which then (as it’s in-thread-context) safely collects data about that thread’s stack.

This, of course, presents a limitation: If a thread is sufficiently wedged such that it can’t run kernel APCs, then a triage dump won’t be able to capture full information about that thread. However, for some classes of scenarios where simply capturing a kernel mode stack is sufficient, triage dumps can sometimes fill in the gap in conjuction with user mode debugging, even if the system wasn’t booted with /DEBUG.