Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walk routine

This is the final post in the x64 exception handling series, comprised of the following articles:

  1. Programming against the x64 exception handling support, part 1: Definitions for x64 versions of exception handling support
  2. Programming against the x64 exception handling support, part 2: A description of the new unwind APIs
  3. Programming against the x64 exception handling support, part 3: Unwind internals (RtlUnwindEx interface)
  4. Programming against the x64 exception handling support, part 4: Unwind internals (RtlUnwindEx implementation)
  5. Programming against the x64 exception handling support, part 5: Collided unwinds
  6. Programming against the x64 exception handling support, part 6: Frame consolidation unwinds
  7. Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walk routine

Armed with all of the information in the previous articles in this series, we can now do some fairly interesting things. There are a whole lot of possible applications for the information described here (some common, some estoric); however, for simplicities sake, I am choosing to demonstrate a classical stack walk (albeit one that takes advantage of some of the newly available information in the unwind metadata).

Like unwinding on x86 where FPO is disabled, we are able to do simple tasks like determine frame return addresses and stack pointers throughout the call stack. However, we can expand on this a great deal on x64. Not only are our stack traces guaranteed to be accurate (due to the strict calling convention requirements on unwind metadata), but we can retrieve parts of the nonvolatile context of each caller with perfect reliability, without having to manually disassemble every function in the call stack. Furthermore, we can see (at a glance) which functions modify which non-volatile registers.

For the purpose of implementing a stack walk, it is best to use RtlVirtualUnwind instead of RtlUnwindEx, as the RtlUnwindEx will make irreversible changes to the current execution state (while RtlVirtualUnwind operates on a virtual copy of the execution state that we can modify to our heart’s content, without disturbing normal program flow). With RtlVirtualUnwind, it’s fairly trivial to implement an unwind (as we’ve previously seen based on the internal workings of RtlUnwindEx). The basic algorithm is simply to retrieve the current unwind metadata for the active function, and execute a virtual unwind. If no unwind metadata is present, then we can simply set the virtual Rip to the virtual *Rsp, and increment the virtual *Rsp by 8 (as opposed to invoking RlVirtualUnwind). Since RtlVirtualUnwind does most of the hard work for us, the only thing left after that is to interpret and display the output (or save it away, if we are logging stack traces).

With the above information in mind, I have written an example x64 stack walking routine that implements a basic x64 stack walk, and displays the nonvolatile context as viewed by each successive frame in the stack. The example routine also makes use of the little-used ContextPointers argument to RtlVirtualUnwind in order to detect functions that have used a particular non-volatile register (other than the stack pointer, which is immediately obvious). If a function in the call stack writes to a non-volatile register, the stack walk routine takes note of this and displays the modified register, its original value, and the backing store location on the stack where the original value is stored. The example stack walk routine should work “as-is” in both user mode and kernel mode on x64.

As an aside, there is a whole lot of information that is being captured and displayed by the stack walk routine. Much of this information could be used to do very interesting things (like augment disassembly and code analysis by defintiively identifying saved registers and parameter usage. Other possible uses are more estoric, such as skipping function calls at run-time in a safe fashion, or altering the non-volatile execution context of called functions via modification of the backing store pointers returned by RtlVirtualUnwind in ContextPointers. The stack walk use case, as such, really only begins to scratch the surface as it relates to some of the many very interesting things that x64 unwind metadata allows you to do.

Comparing the output of the example stack walk routine to the debugger’s built-in stack walk, we can see that it is accurate (and in fact, even includes more information; the debugger does not, presently, have support for displaying non-volatile context for frames using unwind metadata (Update: Pavel Lebedinsky points out that the debugger does, in fact, have this capability with the “.frame /r” command)):

StackTrace64: Executing stack trace...
FRAME 00: Rip=00000000010022E9 Rsp=000000000012FEA0
 Rbp=000000000012FEA0
r12=0000000000000000 r13=0000000000000000
 r14=0000000000000000
rdi=0000000000000000 rsi=0000000000130000
 rbx=0000000000130000
rbp=0000000000000000 rsp=000000000012FEA0
 -> Saved register 'Rbx' on stack at 000000000012FEB8
   (=> 0000000000130000)
 -> Saved register 'Rbp' on stack at 000000000012FE90
   (=> 0000000000000000)
 -> Saved register 'Rsi' on stack at 000000000012FE88
   (=> 0000000000130000)
 -> Saved register 'Rdi' on stack at 000000000012FE80
   (=> 0000000000000000)

FRAME 01: Rip=0000000001002357 Rsp=000000000012FED0
[...]

FRAME 02: Rip=0000000001002452 Rsp=000000000012FF00
[...]

FRAME 03: Rip=0000000001002990 Rsp=000000000012FF30
[...]

FRAME 04: Rip=00000000777DCDCD Rsp=000000000012FF60
[...]
 -> Saved register 'Rbx' on stack at 000000000012FF60
   (=> 0000000000000000)
 -> Saved register 'Rsi' on stack at 000000000012FF68
   (=> 0000000000000000)
 -> Saved register 'Rdi' on stack at 000000000012FF50
   (=> 0000000000000000)

FRAME 05: Rip=000000007792C6E1 Rsp=000000000012FF90
[...]


0:000> k
Child-SP          RetAddr           Call Site
00000000`0012f778 00000000`010022c6 DbgBreakPoint
00000000`0012f780 00000000`010022e9 StackTrace64+0x1d6
00000000`0012fea0 00000000`01002357 FaultingSubfunction3+0x9
00000000`0012fed0 00000000`01002452 FaultingFunction3+0x17
00000000`0012ff00 00000000`01002990 wmain+0x82
00000000`0012ff30 00000000`777dcdcd __tmainCRTStartup+0x120
00000000`0012ff60 00000000`7792c6e1 BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 RtlUserThreadStart+0x1d

(Note that the stack walk routine doesn’t include DbgBreakPoint or the StackTrace64 frame itself. Also, for brevity, I have snipped the verbose, unchanging parts of the nonvolatile context from all but the first frame.)

Other interesting uses for the unwind metadata include logging periodic stack traces at choice locations in your program for later analysis when a debugger is active. This is even more powerful on x64, especially when you are dealing with third party code, as even without special compiler settings, you are guaranteed to get good data with no invasive use of symbols. And, of course, a good knowledge of the fundamentals of how the exception/unwind metadata works is helpful for debugging failure reporting code (such as custom unhandled exception filter) on x64.

Hopefully, you’ve found this series both interesting and enlightening. In a future series, I’m planning on running through how exception dispatching works (as opposed to unwind dispatching, which has been relatively thoroughly covered in this series). That’s a topic for another day, though.

5 Responses to “Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walk routine”

  1. Pavel Lebedinsky says:

    > the debugger does not, presently, have support for displaying non-volatile context for frames using unwind metadata)

    Isn’t this what .frame /r does?

  2. Skywing says:

    You’re right; that does display registers for previous frames. You learn something every day; thanks for the correction.

  3. […] как работает SEH (Structured Exception Handling) (например здесь или здесь), так что я не буду останавливаться на этом детально. «Р[…]

  4. […] как работает SEH (Structured Exception Handling) (например здесь или здесь), так что я не буду останавливаться на этом детально. […]

  5. […] reading up on Win64 structured exception tracing (from Programming against the x64 exception handling support, part 7: Putting it all together, or building…), I converted the code […]