Introduction to x64 debugging, part 3

The last installment of this series described some of the basics of the new calling convention in use on x64 Windows, and how it will impact the debugging experience. This post describes how the unwinding and exception handling aspects matter to you when you debug programs.

I touched on some of the benefits of the new unwind mechanism in the last post – specifically, how you can expect to see full stack traces even without symbols – but, I didn’t really go into a whole lot of detail as to how they are implemented. Microsoft has the full set of details available on MSDN. Rather than restate them all here, I’m going to try to put them into perspective with respect to debugging and how they matter to you.

Perhaps the easiest way to do this is to compare them with x86 exception handling (EH)/unwind support. In the x86 Win32 world, EH/unwind are implemented as a linked list of EXCEPTION_REGISTRATION structures stored at fs:[0] (the start of the current threads TEB). When an exception occurs, the exception dispatcher code (either in NTDLL for a user mode exception or NTOSKRNL for a kernel mode exception) searches through this linked list and calls each handler with information about the exception. The exception handler can indicate that control should be resumed immediately to the faulting context, or that the next handler should be called, or that the exception handler has handled the exception and that the stack should be unwound to it. The first two paths are fairly straightforward; either a context record is continued via NtContinue (if you aren’t familiar with the native API layer, this is effectively a longjmp), or the next handler in the chain is called. If the last handler in the list is reached and does not handle the exception then the thread is terminated (for Win32 programs, this should never happen, as Kernel32 installs an exception handler that will catch all exceptions before it calls process / thread entrypoint functions definened by an application).

The unwind path is a bit more interesting; here, all of the exception handlers between the one that requested an unwind and the top of the list are called with a flag indicating that they should unwind the stack. Each exception handler routine “knows” how to unwind the procedure(s) that it is responsible for. In this mechanism, the stack gets unwound properly back to the point where the exception was handled. While this works well enough for the actual exception handling process itself, there is a flaw in this design; it precludes unwinding call frames without actually calling the unwind handlers in question. In addition, functions in the middle of an unwind path which did not register an exception handler are invisible to the unwind code itself (this does not pose a problem for normal unwinds, as for any function that has any unwind special unwind requirements, such as functions with C++ objects on the stack that have destructors, will implicitly register an exception handler).

What this means for you as it relates to debugging is that on x86, it isn’t generally possible to cleanly unwind *without calling the unwind/exception handler functions*. This means that the debugger cannot automatically unwind the stack and produce a valid stack trace with reliable results, without special help, typically in the form of symbols that specify how a function uses the stack. If a function in the middle of the call stack doesn’t have symbols, then there is a good chance that any debugger-initiated stack traces will stop at that function (a common and frustrating occurance if you are debugging code without symbols on x64).

As I alluded to in the previous posting, this problem has gone away on x64, thanks to the new unwind semantics. The way this works under the hood is that every function that is a non-leaf function (that is, every function which calls another function) is required to have a set of metadata associated with it that describes how the function is to be unwound. This is similar in prinicple to the symbol unwind information used in x86 if you have symbols, except that it is built into the binary itself (or dynamically registered at runtime, for dynamically generated code, like .NET). This unwind metadata has everything necessary to unwind a function without actually having to call exception handling code (and, indeed, exception handlers no longer perform “manual” unwinds as is the case on x86 – the NTDLL or NTOSKRNL exception dispatcher can take care of this for you thanks to the new unwind metadata).

For most purposes, you can be oblivious to this fact while debugging something; the debugger will automagically use the unwind metadata to construct accurate stack traces, even with no symbols available. An example of this is:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x61
00000000`0012fa60 00000000`78ddc92a kernel32!GetConsoleInputWaitHandle+0x39d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd+0x1df2c
00000000`0012fd60 00000000`4ad02530 cmd+0x15fa7
00000000`0012fdc0 00000000`4ad035ca cmd+0x2530
00000000`0012fe30 00000000`4ad17027 cmd+0x35ca
00000000`0012fe80 00000000`4ad04eef cmd+0x17027
00000000`0012ff20 00000000`78d5965c cmd+0x4eef
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x2c

 

With symbols loaded, we can see that the stack trace is exactly the same:

 

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fa28 00000000`78ef6301 ntdll!ZwRequestWaitReplyPort+0xa
00000000`0012fa30 00000000`78ddc6ed ntdll!CsrClientCallServer+0x9f
00000000`0012fa60 00000000`78ddc92a kernel32!ReadConsoleInternal+0x23d
00000000`0012fbd0 00000000`4ad1df2c kernel32!ReadConsoleW+0x7a
00000000`0012fca0 00000000`4ad15fa7 cmd!ReadBufFromConsole+0x11c
00000000`0012fd60 00000000`4ad02530 cmd!FillBuf+0x3d6
00000000`0012fdc0 00000000`4ad035ca cmd!Lex+0xd2
00000000`0012fe30 00000000`4ad17027 cmd!Parser+0x132
00000000`0012fe80 00000000`4ad04eef cmd!main+0x458
00000000`0012ff20 00000000`78d5965c cmd!mainCRTStartup+0x171
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

 

As you can see, even with no symbols, we still get a stack trace that includes all of the functions active in the selected thread context.

Sometimes you will need to manually examine the unwind data, however. One of the major reasons for this is if you need to do some work with an exception handler. On x86, the familiar set of instructions “push fs:[0]; mov fs:[0], esp” (or equivalent) signify an exception handler registration. In x64 debugging, you won’t see anything like this, because there is no runtime registration of exception handlers (except via calls to RtlAddFunctionTable). To determine if a function has an exception handler (and what the address is), you’ll need to use a command that you have probably never touched before – .fnent. The .fnent (function entry) command displays the active EH/unwind metadata associated with a function, among other misc. information about the function in question (such as its extents). For instance:

 

0:000> .fnent kernel32!LocalAlloc
Debugger function entry 00000000`01dc2ab0 for:
(00000000`78d6e690)   kernel32!LocalAlloc   |
(00000000`78d6e730)   kernel32!GetCurrentProcessId
Exact matches:
kernel32!LocalAlloc = 

BeginAddress      = 00000000`0002e690
EndAddress        = 00000000`0002e6c3
UnwindInfoAddress = 00000000`000d9174

 

Unfortunately, this command does not directly translate the exception handler information that we are interested in, so we have to do some manual work. The offsets provided are relative to the base of the module in which the function resides, so working with our existing example, we’ll need to add the value “kernel32” to each of the offsets to form a completed address.

The format of the unwind information itself is described on MSDN; the important parts are as follows:

 

typedef struct _UNWIND_INFO {
UBYTE Version       : 3;
UBYTE Flags         : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes;
UBYTE FrameRegister : 4;
UBYTE FrameOffset   : 4;
UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
*   union {
*       OPTIONAL ULONG ExceptionHandler;
*       OPTIONAL ULONG FunctionEntry;
*   };
*   OPTIONAL ULONG ExceptionData[]; */
} UNWIND_INFO, *PUNWIND_INFO; 

typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset;
UBYTE UnwindOp : 4;
UBYTE OpInfo   : 4;
};
USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

 

Given the structure definition above, we can write a simplified debugger expression to parse the unwind information structure and tell us the interesting bits. This expression does not handle all cases – in particular, it doesn’t handle chained unwind information properly, for which you would need to write a more complicated expression or do the work manually.

 

0:000> u kernel32+dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4)
kernel32!_C_specific_handler:
00000000`78d92180 ff25eafafaff jmp qword ptr
[kernel32!_imp___C_specific_handler (0000000078d41c70)]

 

The expression finds the count of unwind codes from an UNWIND_INFO structure, performs the necessary alignment calculates, multiplies the resulting value by the size of the UNWIND_CODE union, and adds the resultant value to the offset into the UNWIND_INFO structure where unwind codes are stored. Then, this value is added to the pointer to the UNWIND_INFO structure itself, which gives us a pointer to UNWIND_INFO.ExceptionHandler. This value is an offset into the module for which the exception handler routine is associated with, so by adding the base address of the module, we (finally!) get the address of the exception handler function itself. In this case, it’s __C_specific_handler, which is the equivalent of _except_handler3 in x86 (the standard VC++ generated exception handler for C/C++ code). __C_specific_handler has its own metadata stored in the “ExceptionData” member that describes where the actual C/C++ exception handlers are (i.e. the exception filter/exception handler defined with __except in CL). The format of these structures is as so:

 

typedef struct _CL_SCOPE {
ULONG BeginOffset;   // imagebase relative
ULONG EndOffset;     // imagebase relative
ULONG HandlerOffset; // imagebase relative
ULONG TargetOffset;  // imagebase relative
} CL_SCOPE, * PCL_SCOPE; 

typedef struct _CL_EXCEPTION_DATA {
ULONG NumEntries;
CL_SCOPE ScopeEntries;
} CL_EXCEPTION_DATA, * PCL_EXCEPTION_DATA;

 

If the exception handler is a CL one using __C_specific_handler (as is the case here), we can find the code corresponding to the __except filter/handler by dumping the CL scope table entries as so:
 

0:000> dd kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1)
* 2 + 4 + 4 + 4) L dwo(kernel32+00000000`000d9174+
@@c++((1+ @@masm(by(2+kernel32+00000000`000d9174))) & ~1) * 2 + 4 + 4) * 4
00000000`78e19198  000164fb 00016524 00000001 000709ef
00000000`78e191a8  00016524 00016565 00000001 000709ef
00000000`78e191b8  00016565 00016583 00000001 000709ef
00000000`78e191c8  00016583 00016585 00000001 000709ef
00000000`78e191d8  00070968 0007098d 00000001 000709ef
00000000`78e191e8  0007098d 000709cc 00000001 000709ef
00000000`78e191f8  000709cc 000709ef 00000001 000709ef

 

This command gave us a list of address ranges within kernel32!LocalAlloc that are covered by a C/C++ exception handler, whether there is a filter expression or not (depending on the value of HandlerOffset; 1 signifies that the exception is simply handled by executing the “TargetOffset” routine), and the offset of the handler (TargetOffset). All of the offsets are relative to the base address to kernel32. We can unassemble the handler specified by each of them to see that it is simply setting the last Win32 error based on an exception code:

 

0:000> u kernel32+000709ef
kernel32!LocalAlloc+0x1cb:
00000000`78db09ef 33ff             xor     edi,edi
00000000`78db09f1 48897c2420       mov     [rsp+0x20],rdi
00000000`78db09f6 8bc8             mov     ecx,eax
00000000`78db09f8 e863dcfbff call kernel32!BaseSetLastNTError (0000000078d6e660)
00000000`78db09fd 8d7701           lea     esi,[rdi+0x1]
00000000`78db0a00 448b642460       mov     r12d,[rsp+0x60]
00000000`78db0a05 488b5c2428       mov     rbx,[rsp+0x28]
00000000`78db0a0a e9765bfaff jmp kernel32!LocalAlloc+0x1e6 (0000000078d56585)

 

That’s all for this post. Next time, I’ll talk about some of the common “gotchas” when dealing with Wow64 debugging.

Additional credits for this article: C++ exception handling information from “Improved Automated Analysis of Windows x64 Binaries” by skape.

Leave a Reply