Frame pointer omission (FPO) optimization and consequences when debugging, part 1

During the course of debugging programs, you’ve probably ran into the term “FPO” once or twice. FPO refers to a specific class of compiler optimizations that, on x86, deal with how the compiler accesses local variables and stack-based arguments.

With a function that uses local variables (and/or stack-based arguments), the compiler needs a mechanism to reference these values on the stack. Typically, this is done in one of two ways:

  • Access local variables directly from the stack pointer (esp). This is the behavior if FPO optimization is enabled. While this does not require a separate register to track the location of locals and arguments, as is needed if FPO optimization is disabled, it makes the generated code slightly more complicated. In particular, the displacement from esp of locals and arguments actually changes as the function is executed, due to things like function calls or other instructions that modify the stack. As a result, the compiler must keep track of the actual displacement from the current esp value at each location in a function where a stack-based value is referenced. This is typically not a big deal for a compiler to do, but in hand written assembler, this can get a bit tricky.
  • Dedicate a register to point to a fixed location on the stack relative to local variables and and stack-based arguments, and use this register to access locals and arguments. This is the behavior if FPO optimization is disabled. The convention is to use the ebp register to access locals and stack arguments. Ebp is typically setup such that the first stack argument can be found at [ebp+08], with local variables typically at a negative displacement from ebp.

A typical prologue for a function with FPO optimization disabled might look like this:

push   ebp               ; save away old ebp (nonvolatile)
mov    ebp, esp          ; load ebp with the stack pointer
sub    esp, sizeoflocals ; reserve space for locals
...                      ; rest of function

The main concept is that FPO optimization is disabled, a function will immediately save away ebp (as the first operation touching the stack), and then load ebp with the current stack pointer. This sets up a stack layout like so (relative to ebp):

[ebp-01]   Last byte of the last local variable
[ebp+00]   Old ebp value
[ebp+04]   Return address
[ebp+08]   First argument...

Thereafter, the function will always use ebp to access locals and stack based arguments. (The prologue of the function may vary a bit, especially with functions using a variation __SEH_prolog to setup an initial SEH frame, but the end result is always the same with respect to the stack layout relative to ebp.)

This does (as previously stated) make it so that the ebp register is not available for other uses to the register allocator. However, this performance hit is usually not enough to be a large concern relative to a function compiled with FPO optimization turned on. Furthermore, there are a number of conditions that require a function to use a frame pointer which you may hit anyway:

  • Any function using SEH must use a frame pointer, as when an exception occurs, there is no way to know the displacement of local variables from the esp value (stack pointer) at exception dispatching (the exception could have happened anywhere, and operations like making function calls or setting up stack arguments for a function call modify the value of esp).
  • Any function using automatic C++ objects with destructors must use SEH for compiler unwind support. This means that most C++ functions end up with FPO optimization disabled. (It is possible to change the compiler assumptions about SEH exceptions and C++ unwinding, but the default [and recommended setting] is to unwind objects when an SEH exception occurs.)
  • Any function using _alloca to dynamically allocate memory on the stack must use a frame pointer (and thus have FPO optimization disabled), as the displacement from esp for local variables and arguments can change at runtime and is not known to the compiler at compile time when code is being generated.

Because of these restrictions, many functions you may be writing will already have FPO optimization disabled, without you having explicitly turned it off. However, it is still likely that many of your functions that do not meet the above criteria have FPO optimization enabled, and thus do not use ebp to reference locals and stack arguments.

Now that you have a general idea of just what FPO optimization does, I’ll cover cover why it is to your advantage to turn off FPO optimization globally when debugging certain classes of problems in the second half of this series. (It is actually the case that most shipping Microsoft system code turns off FPO as well, so you can rest assured that a real cost benefit analysis has been done between FPO and non-FPO optimized code, and it is overall better to disable FPO optimization in the general case.)

Update: Pavel Lebedinsky points out that the C++ support for SEH exceptions is disabled by default for new projects in VS2005 (and that it is no longer the recommended setting). For most programs built prior to VS2005 and using the defaults at that time, though, the above statement about C++ destructors causing SEH to be used for a function (and thus requiring the use of a frame pointer) still applies.

7 Responses to “Frame pointer omission (FPO) optimization and consequences when debugging, part 1”

  1. dispensa says:

    > The convention is to use the ebp register to access locals and stack arguments. Ebp is typically setup such that the first stack argument can be found at [ebp+08], with local variables typically at a negative displacement from ebp.

    I think it’s probably fair to say it’s more than convention; ebp (== base pointer) is one of the few registers whose segment default is SS (the only other one being esp?). The rest reference DS or CS.

    I know it’s not a problem on Windows, since the segments (other than FS/GS) all match, but in principle, these are implicit 48-bit addresses, and EDI, for example, can’t be used to reference the stack without an override.

    Now, with that completely theoretical assertion out of the way, do you run across a lot of generated code using register-indirect addressing into the stack with ESI/EDI/etc. as a base?

  2. Skywing says:

    No. I would assume this is primarily because of their use in inline memcmp/strcmp/memcpy.

  3. Pavel Lebedinsky says:

    As of VS 2005, the default is to not invoke local destructors for SEH exceptions:

    http://msdn2.microsoft.com/en-us/library/1deeycx5.aspx

    /EHa still works but generally is not recommended.

  4. Vladimir Scherbina says:

    Skywing,

    Did you managed /Oy to work? I did some tests in past and they failed.

    Simple code that compares dword value with zero looks identical in both cases: when compiling with /Oy and w/o /Oy: (this is what I have in both cases)

    ; HRESULT DllCanUnloadNow(void)
    .text:10005650 public DllCanUnloadNow
    .text:10005650 DllCanUnloadNow proc near
    .text:10005650 push ebp
    .text:10005651 mov ebp, esp
    .text:10005653 xor eax, eax
    .text:10005655 cmp dword_10010078, 0
    .text:1000565C setnz al
    .text:1000565F pop ebp
    .text:10005660 retn
    .text:10005660 DllCanUnloadNow endp

  5. Skywing says:

    Vladimir: Try using “/Oy-“. Also, I’ve noticed that there are a lot of places where recent compilers (e.g. VS2005) will absolutely insist on using EBP when they don’t really *need* to per-se, such as functions with array local variables. So, you might have a bit of trouble in getting CL 14 / VS2005 to omit code to use EBP as a frame in some cases (i.e. forcing CL to use direct ESP accesses).

  6. […] on Windows, it is still a problem that cannot be easily dismissed. Finally, optimizations such as Frame Pointer Omission can thwart attempts to perform a stack walk by following the […]

Leave a Reply