Debugger tricks: API call logging, the quick’n’dirty way (part 3)

Previously, I introduced several ways to use the debugger to log API calls. Beyond what was described in that article, there are some other, more complicated examples that are worth reviewing. Additionally, there are certain limitations that should be considered when using the debugger instead of a dedicated API logging program.

Although logging breakpoints like I’ve previously described (i.e. displaying function input parameters and return values) are certainly handy, you’ve probably already come up with a couple of scenarios where breakpoints in the style like I’ve provided won’t give you what you need to track down a problem.

The most notable example of this is when you need to examine an out parameter that is filled by a function call, after the function call is made. This provides a problem, as it’s generally not reliable to access the function parameters on the stack after the function call has returned (in both stack and register based calling conventions in use on Windows, the called function is free to modify the parameter locations as it sees fit, and this is actually fairly common with optimizations enabled). As a result, what we really need is the ability to save some state across the function call, so that we can access some of the function’s arguments after the function returns.

Fortunately, this is doable within the debugger, albeit in a rather roundabout way. The key here is the usage of so-called user-defined pseudo-registers, which are conceptually extra platform-independent storage locations (accessed like regular registers in terms of the expression evaluator, hence the term pseudo-register). These pseudo-registers are essentially just variables in the conventional programming sense, although there are a limited number of them available (20 in the current release). As a result, there are some limitations on what can be accomplished using them, but for most circumstances, 20 is enough. If you find yourself needing to track more state than that, you should strongly consider writing a debugger extension in C instead of using the debugger script language.

(As an aside, at Driver DevCon a couple of years ago, I remember sitting in on a WinDbg-oriented session in which the presenter was at one point going over a large program written in the (then-relatively-new) expanded debugger scripting language, with additional support for conditionals and error handling. I still can’t but help think of debugger-script programs as combining the ugliest parts of Perl with cmd.exe-style batch scripts (although to be fair, the debugger expression evaluator is a bit more powerful than batch scripts, and it was also never originally intended to be used for more than simple expressions). To be honest, I would still strongly recommend against writing highly complex debugger-script programs where possible; they are something of a maintenance nightmare, among other things. For such circumstances, writing a debugger extension (or a program to drive the debugger entirely) is a better choice. I digress, however; back to the subject of call logging.)

The debugger’s user-defined pseudo-register facility provides an effective (if perhaps slightly awkward) means of storing state, and this can be used to save parameter values across a function call. For example, we might want to log all calls to ReadFile, such that we want a dump of the file data being read in. To accomplish this task, we’ll need to dump the contents of the output buffer (and use the bytes transferred count, another out parameter). This could be accomplished like so (in this case, for brevitiy, I am assuming that the program is using ReadFile in synchronous I/O mode):

0:000> bp kernel32!ReadFile "r @$t0 = poi(@esp+8) ; r @$t1 = poi(@esp+10) ; g @$ra ; .if (@eax != 0) { .printf \"Read %lu bytes: \\n\", dwo(@$t1) ; db @$t0 ldwo(@$t1) } .else { .echo Read failed! ; !gle } ; g "

The output of this command might be like so:

Read 22 bytes: 
0016ec3c  54 68 69 73 20 69 73 20-61 20 74 65 78 74 20 66
              This is a text f
0016ec4c  69 6c 65 2e 0d 0a

(Awkward wrapping done by me to avoid breaking the blog layout.)

This command is essentially a logical extension of yesterday’s example, with the addition of some state that is shared across the call. Specifically, the @$t0 and @$t1 user-defined pseudo-registers are used to save the lpBuffer ([esp+08h]) and lpNumberOfBytesRead ([esp+10h]) arguments to the ReadFile call across the function’s execution. When execution is stopped at the return address, the contents of the file data that were just read are dumped by dereferencing the values referred to by @$t0 and @$t1.

Although this sort of state-saving across execution can be useful, there are downsides. Firstly, this sort of breakpoint is fundamentally incompatible with multiple threads (at least in as much as multiple threads hitting the breakpoint in question simultaneously). This is because the debugger provides no provision for “expression-local”, or “thread-local” state – multiple threads hitting the breakpoint at the same time can step on eachothers toes, so to speak. (This problem can also occur with any sort of breakpoint that involves resuming execution until an implicit breakpoint created by a “g <address>” command, although it is arguably more severe with “stateful” breakpoints.)

This limitation in the debugger can be worked around in a limited fashion by making a breakpoint thread-specific via a thread specifier in the g command, although this is typically hardly convenient to do. Many call logging programs will account for multithreading natively and will not require any special work to accomodate multithreaded function calls. (Note that this problem is often not as severe as it might sound – in many cases, even in multithreaded programs, there is typically only one function that calls a function you’re interested in, or the liklihood of a thread collision is sufficiently small that it works anyway the vast majority of the time. However, in some circumstances, these style of breakpoints just do not work well if the function in question is called frequently from many threads and requires inspection of data after the function returns.)

Another significant limitation of using the debugger to do call logging (as opposed to a dedicated program) is that the debugger is typically very slow compared to a dediated program doing logging. The reason here is that for every breakpoint event, essentially all threads in the program are frozen, various state information is copied from the program to the debugger, and then the breakpoint expression is evaluated debugger-side. Additionally, unlike with a dedicated program, the results of the logging breakpoint are displayed in real time, instead of (say) being stored in a binary log buffer somewhere for later format and display. This means that even more overhead is incurred as the debugger UI needs to be updated on every breakpoint. As a result, if you set a conditional breakpoint on a frequently hit function, you may notice the program slow down significantly, perhaps even to the point of being unusable. Dedicated logging programs can employ a variety of techniques to circumvent these limitations of the debugger, which are primarily artifacts of the fact that the debugger is primarily designed to be a debugger and not a high-speed API monitor.

This is even more noticible in the kernel debugger case, as transitions to the debugger in in KD mode are very slow, such that even several transitions per second is enough to make a system all but unusable in practical terms. As a result, one needs to be extra careful in picking locations to set conditional logging breakpoints at in the kernel debugger (perhaps placing them in the middle of a function, in a specific interesting code path, rather than at the start so that all calls will be caught).

Given these limitations, it is worth doing a bit of analysis on the problem to determine if the debugger or a dedicated logging program is the best choice. Both approaches have strengths and weaknesses, and although the debugger is extremely flexible (and often very convenient), it isn’t necessarily the best choice in every conceivable scenario. In other words, use the best tool for the job. However, there are some circumstances where the only option is to use the debugger, such as kernel-mode call logging, so I would recommend at least having some basic knowledge of how to accomplish logging tasks with the debugger, even if you would normally always use a dedicated logging program. (Although, in the case of kernel mode debugging, again, the slowness of debugger transitions makes it important to pick “low-traffic” locations to breakpoint on.)

Still, an important part of being effective at debugging and solving problems is knowing your options and when (and when not) to use them. Using the debugger to perform call logging should just be one of many such options in your “debugging toolkit”.

4 Responses to “Debugger tricks: API call logging, the quick’n’dirty way (part 3)”

  1. nksingh says:

    Hi Skywing,

    Are there any tools you use for just detouring the APIs entirely through functions of your own choosing? It seems like this could be a good debugger extension: load a dll with api wrappers and selectively apply the patches or disable them when you’re done tracing that function.

  2. Skywing says:

    There are lots of libraries for doing that programmatically, sure – and it’s easy to make one yourself given some sort of disassembler library (of which there are also many). However, I don’t know of any designed to interact with the debugger on the fly like that. You would run into many of the same problems as with a conventional logging app with what you are describing – you need to have some way of telling the logging extension how the function is called (or at least what registers/locations you are interested in) and describe the appropriate conditions to it. Certainly doable, but probably a fair amount of work.

    When I need to do some sort of tracing or logging of “high-traffic” functions for only certain conditions, and it’s just too slow to do through the debugger, one option I might use is to just manually patch in some comparison code from the debugger. This is probably most often useful in kernel mode where you can’t just fall back to a dedicated logging program in many circumstances. For example, patch the start of a function with a “jmp <memory> (via the “a” command), where you then assemble some test code in the chosen scratch location that potentially breaks into the debugger if the condition you wanted is hit, or otherwise just passes through to the original function implementation.

    That’s really taking things into your own hands, though. In KD mode, I don’t know of a good analog to “.dvalloc” to allocate memory in the target, and so you’ll have to find somewhere to put your patch code. And, of course, your patch had better be just right the first time or you’re probably going to bring down the target. (Beware various annoying bugs in “a” as well – the extension has trouble assembling sometime instructions correctly, and will sometimes emit something that is just plain wrong without giving you any errors, to be unhappily discovered when your patch blows up when it is executed next.)

    Also, technologies like PatchGuard can often hinder this if they apply to the system you’re looking at if you aren’t careful.

  3. F says:

    Fuck, I wrote comment but it was destroyed when I hit Submit (no email) :(.

    Anyway, I think that detours are needed when high performance is necessary. I wanted to log communication between R3 app to VGA card. (app->driver for every _asm in out->vga), but it was impossible, because BP on deviceiocontrol slow it down, so app was no more able to communicate with VGA…

  4. […] way you might approach this is with a conditional breakpoint, but the SetLastError path is typically frequently hit, so this is often be problematic in terms of […]