Programming against the x64 exception handling support, part 2: A description of the new unwind APIs

Last time, I described many of the structures and prototypes necessary to program against the new x64 exception handling (EH) support. This posting continues that series, and describes how to manually initiate an unwind of procedure frames (and when and why you might want to do this).

Because x64 has built-in support for data-driven unwinding, there are a great many interesting things that you can do with unwinding functions at arbitrary points in execution. Unlike x86, you don’t have to either assume that all functions use a frame pointer (which is typically not the case in many programs), and you don’t need to call code with a certain register context setup in the correct way (with the right local variables at the right displacements from the stack pointer) in order to initiate an unwind of a function that had registered an unwind or exception handler.

If you’ve been reading some of my recent postings about performing stack traces on x86, then you one of the first things that might come to mind is designing an approach that can create a “perfect” call stack in all situations without symbols. There are other benefits to this data-driven unwind data approach, however, than simply being able to take accurate call stacks at arbitrary points in the execution process. For instance, there are particularly interesting benefits as far as instrumentation and code analysis go (such as an improved ability to detect most functions in an image programmatically with a great deal of certainty based on unwind data), and there are interesting implications for techniques such as function patching and modification on the fly as well.

First things first, however. The initial step is to get familiar with the new unwinding APIs that Win64 exposes on x64. Although these APIs can be manually duplicated by explicit parsing of all unwind information, I would recommend calling the APIs directly instead of doing all of the work to manually emulate unwinds yourself. The reason that I make that recommendation is that while the unwind metadata is documented, there is still a significant amount of work involved in reimplementing them from scratch, and the unwind APIs themselves are (mostly) documented on MSDN and thus unlikely to change.

There are several APIs in particular that you’ll frequently find yourself using for unwind support on x64. These APIs are available in both user mode and kernel mode (and aside for a lack of support for dynamically generated unwind data) the two operating environments use exactly the same semantics for unwinding. Thus, for the most part, you can interact with unwind metadata in the same fashion for both user mode and kernel mode.

  1. RtlLookupFunctionEntry: The first API that you’ll likely end up having to call for any unwind-related operation is RtlLookupFunctionEntry. This routine is the basis of all unwind operations in that it allows the caller to translate a raw 64-bit RIP value into two important values: An image base for any associated image in the address space of the caller, and a pointer to the RUNTIME_FUNCTION structure associated with the RIP value passed in. For virtually all cases on x64, you’ll be able to retrieve a valid RUNTIME_FUNCTION structure for the current RIP value. The exception to this rule relates to what are known as leaf functions, or functions that both make no direct modifications to the stack pointer (or any nonvolatile registers), and do not call any subfunctions. For these leaf functions only, the emission of unwind metadata is optional by the compiler. To handle this case, it is typical to read the first ULONG64 from the current RSP value (i.e. the return address of the current leaf function). This address can then be passed to RtlLookupFunctionEntry. Because leaf functions do not touch any nonvolatile registers or alter the stack pointer or call any subfunctions, they can be safely skipped in the unwind process in this fashion. (Virtually all functions in a given x64 binary are non-leaf functions (otherwise known as frame functions), or functions that do not meet the previously described three criteria. In either case, however, the restrictions on leaf functions mean that they do not impact the ability to perform complete unwinds despite the lack of unwind metadata associated with them.)

    The typical usage case for RtlLookupFunctionEntry is simply to retrieve the function entry for the currently executing function. (For leaf functions, it may be necessary to retrieve the function entry for the caller, if there is no unwind metadata for the current function, as described above.) Then, the PRUNTIME_FUNCTION returned is typically passed to one of the “high level” unwind support routines, although if necessary, it can be manually interpreted directly (this is typically not required, however).

  2. RtlVirtualUnwind: The RtlVirtualUnwind API is the core of the Win64 x64 unwind support. This API implements the lowest level interface exposed for interacting with unwind metadata through a RUNTIME_FUNCTION. In particular, it implements all of the code necessary to interpret UNWIND_CODEs and adjust the stack and nonvolatile register context according to the unwind information specified via a RUNTIME_FUNCTION. It also has logic to locate and execute exception or unwind handlers for a given function.

    RtlVirtualUnwind provides the infrastructure upon which higher level exception and unwind handling support is implemented. It exposes the concept of a virtual unwind (as one might guess, given the routine’s name). The virtual unwind concept is one that is entirely new to x64 (and IA64), and does not exist in any form on x86. This is due entirely to the fact that IA64 and x64 have data-driven unwind support, while x86 has code-driven unwind support.

    The distinction is important in that on x64 and IA64, it is possible to simulate an unwind, at an arbitrary point in time, without running code with potentially unknown side effects (or unknown entry conditions, as with x86 exception or unwind handlers that utilize local variables). This is accomplished by interpreting the unwind codes described by a RUNTIME_FUNCTION and associated UNWIND_INFO blocks. This is the essence of what a virtual unwind is; a simulated unwind operation that can operate on an arbitrary, isolated register context without affecting (or otherwise impacting) the actual realized state of the program. In its purest form, a virtual unwind can be accomplished by invoking RtlVirtualUnwind with a register context that you wish to have the unwind applied to, and the UNW_FLAG_NHANDLER flag value for the HandlerType parameter (which suppresses the invokation of any unwind or exception handlers registered by the function).

    This is a very powerful capability indeed, as it allows for a much more complete and thorough traversal of call frames than ever possible on x86. With the ability to describe and undo the changes to nonvolatile registers given an initial register context and stack, virtual unwinding allows programmatic, completely-reliable access to not only the return address, stack frame, and arguments of arbitrary functions at any point in an active call stack, but also access to nonvolatile register values at any point in a call stack. If you have ever debugged optimized code where parameter values and intermediate values are frequently only present in registers, then you can immediately see how valuable this particular benefit of virtual unwinding is to debugging (it is important to note that as volatile registers are not saved anywhere, it is not necessarily possible to reconstruct their values at any point in the call frame).

    It is also possible to use RtlVirtualUnwind to effect a “realized” unwind, and indeed, RtlVirtualUnwind is the cornerstore on which the rest of the unwinding architecture in Win64 x64 is built. By directing RtlVirtualUnwind to call unwind (or exception) handlers, as appropriate, and then further altering the returned context (such as by specifying a return value), it is possible to perform a complete “realized” unwind from a procedure at an arbitrary point in execution.

  3. RtlUnwindEx: RtlUnwindEx supplants the RtlUnwind API that exists on x86 for purposes of implementing a “hard unwind” that alters the realized execution state of the program. RtlUnwindEx is a natural extension of RtlUnwind that includes support for features new to 64-bit exception handling support. Unlike RtlUnwind, it can operate on a register context other than the current register context.

    RtlUnwindEx implements an unwind that calls all of the necessary unwind handlers necessary to unwind to a particular point. It also adjusts the register context based on the unwind metadata at the given procedure frame being unwound. Internally, RtlUnwindEx is essentially implemented as a wrapper that calls RtlVirtualUnwind and registered unwind handlers as necessary for each frame in between the active frame and the target frame. It also houses all of the logic necessary to deal with some of the other subtleties of unwinding, such as detection of a bogus stack pointer value in the passed in register context.

    RtlUnwindEx is useful if you are needing to execute a complete unwind (and only a complete unwind) of a particular procedure frame or set of procedure frames. In most cases where you would be doing this, it is usually sufficient to just be relying on the language-level exception handling support, so I consider RtlUnwindEx as relatively uninteresting (at least when compared to RtlVirtualUnwind). Many of the more interesting use cases for directly calling the x64 exception handling support thus require the use of RtlVirtualUnwind directly (although selectively unwinding past certain procedure frames with complete support for calling unwind handlers is made easier by direct usage of RtlUnwindEx).

  4. RtlCaptureStackBackTrace: The RtlCaptureStackBackTrace routine is essentially a high-level implementation of a stack walking routine that utilizes the lower level unwind support (in particular, RtlVirtualUnwind). Unlike StackWalk64, RtlCaptureStackBackTrace is very light-weight and does not use symbols (it is implemented entirely with the unwind metadata present on x64). As such, it does not exist on x86. It is, however, handy for quickly capturing stack traces (and can be used in both user mode and kernel mode in the same fashion). RtlCaptureStackBackTrace does not return non-volatile register contexts for each frame being traced, however, so if you require this functionality, then you would need to implement your own stack trace mechanism on top of RtlVirtualUnwind. (It is worth noting that this is sort of mechanism is essentially what functionality like handle tracing and page heap tracing are built on top of, to give you an idea of how useful it can be.) If you only need return addresses for each frame, however, then RtlCaptureStackBackTrace is an excellent API to consider for use if you need to log stack traces at periodic locations in your own programs for later analysis (especially since it doesn’t require anything as invasive as loading symbols).

That’s all there is in this posting. More details on how to use the new unwind support next time…

4 Responses to “Programming against the x64 exception handling support, part 2: A description of the new unwind APIs”

  1. […] Previously, I provided a brief overview of what each of the core APIs relating to x64’s extensive data-driven unwind support were, and when you might find them useful. […]

  2. […] Programming against the x64 exception handling support, part 2: A description of the new unwind APIs […]

  3. Jeffrey Tan says:

    Sorry, but I can find RtlCaptureStackBackTrace in the ntdll on my x86 machine, so it should exist on the x86 machine, yes?

    Also, it is said that UMDH internal leverages the RtlCaptureStackBackTrace to capture the call stack.

  4. Yuhong Bao says:

    On the matter of RtlCaptureStackBackTrace, this function can crash if the developer did not provide proper x64 unwind data: