Archive for the ‘Debugging’ Category

Use a custom symbol server in conjunction with IDA with Vladimir Scherbina’s IDA plugin

Tuesday, December 5th, 2006

Vladimir Scherbina has recently released a useful IDA plugin that enhances IDA’s now built-in support for loading symbols via the symbol server to allow custom symbol server paths. This is something I personally have been wanting for some time; IDA’s PDB loading mechanism overrides _NT_SYMBOL_PATH with a hardcoded value of the Microsoft symbol server. This breaks my little trick for injecting symbol server support into programs that do not already support it, which is fairly annoying. Now, with Vladimir’s plugin, you can have IDA use a custom symbol server without having to hack the PDB plugin and change its hardcoded string constant for the Microsoft symbol server path. (Plus, you can have IDA use your local downstream store cache as well – another disadvantage to how IDA normally loads symbols via PDB.)

Improve performance in kernel debugging and remote debugging with WinDbg by closing extra debugger windows

Tuesday, December 5th, 2006

Here’s a little known tidbit of information that you might find useful while debugging things:

All the extra windows that you may have open in WinDbg (disassembly, registers, memory, and soforth) actually slow the debugger down. Really.

This is typically only noticible on targets with a non-local connection in between your local debugger and the target. Primarily, then, this applies to kernel debugging and remote debugging.

These windows slow down the debugger because they represent extra information that the debugger must fetch each time the target breaks into the debugger. The debugger engine tends to follow a rule of “lazy fetching”; that is, it will try to avoid asking the target for extra information until it absolutely must do so. If you don’t have your disassembly window open, then, you’ll save the debugger from having to read memory from the disassembly target address on every debugger event that causes a breakin (until you actually explicitly request that information via the “u” command, for instance). There are similar speed consequences for having the registers window open (or setting your default register mask to show all registers, as usermode debugging does by default – this is, I suspect, why kernel debugging defaults to a very minimal register mask that is displayed on each prompt), and for having the memory window open. Keeping these windows open will require the debugger to go out of its way to request extra register information (and to read memory on each debugger event).

The result of having these extra windows open is that the debugger will just seem significantly slower when you are doing things like stepping or hitting breakpoints and continuing execution. If you’ve ever wondered why sometimes it takes the debugger a couple seconds or more to come back after each step or trace operation, this is a big contributor to that (especially in kernel debugging).

So, close those extra windows unless you really need them, as they really do slow things down in kernel debugging and remote debugging scenarios. You’ll find that your kernel debugging and remote debugging experiences are much more responsive without the debugger having all those extra round trip times being inserted on each trace/step/break event.

Oh, and you can also temporarily suspend the behavior of keeping these windows up to date while still keeping them open with a command that was recently introduced to WinDbg: “.suspend_ui flag“, where flag is either 0 to disable UI window refreshing, or 1 to enable it. This is mostly useful for scenarios where you still find the disassembly or memory windows useful, but are about to execute a large number of traces or steps and want to reduce overhead for the duration of that activity only.

Analysis of a networking problem: The case of the mysterious SMB connection resets (or “How to not design a network protocol”)

Monday, December 4th, 2006

Recently, I had the unpleasant task of troubleshooting a particularly strange problem at work, in which a particular SMB-based file server would disconnect users if more than one user attempted to simultaneously initiate a file transfer. This would typically result as a file transfer (i.e. drag and drop file copy) initiated by Explorer failing with a “The specified network name is no longer available.” (error #64) dialog box. As this particular problem involved SMB traffic going through a fair amount of custom code we had written, being a VPN/remote access company (this particular problem was seemingly only occuring when using our VPN software), we (naturally) assumed that the issue was caused by some sort of bug in our software that was doing something to upset either the Windows SMB client or SMB server.

So, first things first; after doing initial troubleshooting, we obtained some packet captures of the problematic SMB server (and SMB clients), detailing just what was happening on the network when the problem occured.

Unfortunately, in this particular case, the packet captures did not really end up providing a whole lot of useful information; they did, however, succeed in raising more questions. What we observed is that in the middle of an SMB datastream, the SMB server would just mysteriously send a TCP RST packet, thereby forcibly closing the TCP connection on which SMB was running. This corresponded exactly with one of the file share clients getting the dreaded error #64 dialog, but there was no clue as to what the problem was. In this particular case, there was no packet loss to speak of, and nothing else to indicate some kind of connectivity problem; the SMB server just simply sent an RST seemingly out of the blue to one of the SMB clients shortly after a different SMB client attempted to initiate a file transfer.

To make matters worse, there was no correlation at all as to what the SMB client whose connection got killed was doing when the connection got reset. The client could be either submitting a read request for more data, waiting for a previously sent read request to finish processing, or doing any other operation; the SMB server would just mysteriously close the connection.

In this particular case, the problem would also only occur when SMB was used in conjunction with our VPN software. When the SMB server was accessed over the LAN, the SMB connection would operate fine in the presence of multiple concurrent users. Additionally, when the SMB server was used in conjunction with alternative remote access methods other than our standard VPN system, the problem would mysteriously vanish.

By this time, this problem was starting to look like a real nightmare. The information we had said that there was some kind of problem that was preventing SMB from being used with our VPN software (which obviously would need to be fixed, and quickly), and yet gave no actual leads as to what might cause the problem to occur. According to logs and packet captures, the SMB server would just arbitrarily reset connections of users connecting to SMB servers when used in conjunction with our VPN software.

Fortunately, we did eventually manage to duplicate the problem in-house with our own internal test network. This eventually turned out to be key to solving the problem, but did not (at least initially) provide any immediate new information. While it did prevent us from having to bother the customer this problem was originally impacting while troubleshooting, it did not immediately get us closer to understanding the problem. In the mean time, our industrious quality assurance group had engaged Microsoft in an attempt to see if there was any sort of known issue with SMB that might possibly explain this problem.

After spending a significant amount of time doing exhaustive code reviews of all of our code in the affected network path, and banging our collective heads on the wall while trying to understand just what might be causing the SMB server to kill off users of our VPN software, we eventually ended up hooking up a kernel debugger to an SMB server machine exhibiting this problem in order to see if I could find anything useful by debugging the SMB server (which is a kernel mode driver known as srv.sys). After not getting anywhere with that initially, I decided to try and trace the problem from the root of its observable effects; the reset TCP connection. Through the contact that our quality assurance group had made with Microsoft Product Support Services (PSS), Microsoft had supplied us with a couple of hotfixes for tcpip.sys (the Windows TCP stack) for various issues that ultimately turned out to be unrelated to the underlying trouble with SMB (and did not end up alleviating our problem). Although the hotfixes we received didn’t end up resolving our problem, we decided to take a closer look at what was happening inside the TCP state machine when the SMB connections were being reset.

This turned out to hit the metaphorical jackpot. I had set a breakpoint on every function in tcpip.sys that is responsible for flagging TCP connections for reset, and the call stack caught the SMB server (srv.sys) red-handed:

kd> k
ChildEBP RetAddr  
f4a4fa98 f4cad286 tcpip!SendRSTFromTCB
f4a4fac0 f4cb0ee3 tcpip!CloseTCB+0xbc
f4a4fad0 f4cb0ec7 tcpip!TryToCloseTCB+0x38
f4a4faf4 f4cace69 tcpip!TdiDisconnect+0x205
f4a4fb40 f4cabbed tcpip!TCPDisconnect+0xfd
f4a4fb5c 804e37f7 tcpip!TCPDispatchInternalDeviceControl+0x14d
f4a4fb6c f4c7d132 nt!IopfCallDriver+0x31
f4a4fb84 f4c7cd93 netbt!TdiDisconnect+0x10a
f4a4fbb4 f4c7d0b8 netbt!TcpDisconnect+0x40
f4a4fbd4 f4c7d017 netbt!DisconnectLower+0x42
f4a4fc14 f4c7d11f netbt!NbtDisconnect+0x339
f4a4fc44 f4c7ba5a netbt!NTDisconnect+0x4b
f4a4fc60 804e37f7 netbt!NbtDispatchInternalCtrl+0xb4
f4a4fc70 f43fb9c1 nt!IopfCallDriver+0x31
f4a4fc7c f441e8db srv!StartIoAndWait+0x1b
f4a4fcb0 f441f13e srv!SrvIssueDisconnectRequest+0x4d
f4a4fccc f43eed3e srv!SrvDoDisconnect+0x18
f4a4fce4 f440d3b5 srv!SrvCloseConnection+0xec
f4a4fd18 f44007ae srv!SrvCloseConnectionsFromClient+0x163
f4a4fd88 f43fba98 srv!BlockingSessionSetupAndX+0x274

As it turns out, the SMB server was explicitly disconnecting the pre-existing SMB client when the second SMB client tried to setup a session with the SMB server. This explains why even though the pre-existing SMB client was operating normally, not breaking the SMB protocol and running with no packet loss, it would mysteriously have its connection reset for apparently no good reason.

Further analysis on packet captures revealed that there was always a correlation to one client sending an SMB_SESSION_SETUP_ANDX command to the SMB server and all other clients on the SMB server being abortively disconnected. After digging around a bit more in srv.sys, it became clear that what was happening is that the SMB server would explicitly kill all SMB connections from an IPv4 address that was opening a new SMB session (via SMB_SESSION_SETUP_ANDX), except for the new SMB TCP connection. It turns out that this behavior is engaged if the SMB client sets the “VcNumber” field of the SMB_SESSION_SETUP_ANDX request to a zero value (which the Windows SMB redirector, mrxsmb.sys, does), and the client enables extended security on the connection (this is typically true for modern Windows versions, say Windows XP and Windows Server 2003).

This explains the problem that we were seeing. In one of the configurations, this customer was setup to have remote clients NAT’d behind a single LAN IP address. This, when combined with the SMB redirector sending a zero VcNumber field, resulted in the SMB server killing everyone else’s SMB connections (behind the NAT) when a new remote client opened an SMB connection through the NAT. Additionally, this also fit with an additional piece of information that we eventually uncovered from running into this trouble with customers; one customer in particular had an old Windows NT 4.0 fileserver which always worked perfectly over the VPN, but their newer Windows Server 2003 boxes would tend to experience these random disconnects. This was because NT4 doesn’t support the extended security options in the SMB protocol that Windows Server 2003 does, further lending credence to this particular theory. (It turns out that the code path that disconnects users for having a zero VcNumber is only active when extended security is being negotiated on an SMB session.)

Additionally, someone else at work managed to dig up knowledge base article 301673, otherwise titled “You cannot make more than one client connection over a NAT device”. Evidently, the problem is actually documented (and has been so since Windows 2000), though the listed workaround of using NetBIOS over TCP doesn’t seem to actually work. (Aside, it’s pretty silly that of all things, Microsoft is recommending that people fall back to NetBIOS. Haven’t we been trying to get rid of NetBIOS for years and years now…?).

Looking around a bit on Google for documentation about this particular problem, I ran into this article about SMB/CIFS which documented the shortcoming relating to SMB and NAT. Specifically:

Whenever a new transport-layer connection is created, the client is supposed to assign a new VC number. Note that the VcNumber on the initial connection is expected to be zero to indicate that the client is starting from scratch and is creating a new logical session. If an additional VC is given a VcNumber of zero, the server may assume that any existing connections with that same client are now bogus, and shut them down.

Why do such a thing?

The explanation given in the LANMAN documentation, the Leach/Naik IETF draft, and the SNIA doc is that clients may crash and reboot without first closing their connections. The zero VcNumber is the client’s signal to the server to clean up old connections. Reasonable or not, that’s the logic behind it. Unfortunately, it turns out that there are some annoying side-effects that result from this behavior. It is possible, for example, for one rogue application to completely disrupt SMB filesharing on a system simply by sending Session Setup requests with a zero VcNumber. Connecting to a server through a NAT (Network Address Translation) gateway is also problematic, since the NAT makes multiple clients appear to be a single client by placing them all behind the same IP address.

So, at least we (finally) found out just what was causing the mysterious resets. Apparently, from a protocol design standpoint, SMB is deliberately incompatible with NAT.

As someone who has designed network protocols before, this just completely blows my mind. I cannot think of any good reason to possibly justify breaking NAT, especially with the incredible proliferation of NAT devices in every day life (it seems like practically everyone has a cable/DSL/wireless router type device that does NAT today, not to mention the increased pressure to reuse IP addresses as pressure on the limited IPv4 address space grows). Not to mention that the Windows file sharing protocol has to be one of the most widely used networking protocols in the world. Breaking NAT for that (evidently just for the sake of breaking NAT!) just seems like such an incredibly horrible, not-well-thought-out design decision. Normally, I am usually fairly impressed by how well Microsoft designs their software (particularly their kernel software), but this particular little part of the SMB protocol design is just uncharacteristically completely wrong.

Even more unbelievably, the stated reason that Microsoft gave for this behavior is an optimization to handle the case where the user’s computer has crashed, rebooted, and reconnected to the server before the SMB server’s TCP stack has noticed that the crashed SMB client’s TCP connection has gone away. Evidently, conserving server resources in the case of a client crash is more important than being compatible with NAT. One has to wonder how unstable the Microsoft SMB redirector must have been at the time that this “feature” was added to the SMB protocol to make anyone in their right mind consider such an absolutely ridiculous, mind-bogglingly bad tradeoff.

To date, we haven’t had a whole lot of luck in trying to get Microsoft to fix this “minor” problem in SMB. I’ll post an update if we ever succeed, but as of now, things are unfortunately not looking very promising on that front.

SDbgExt 1.09 released (support for displaying x64 EH data)

Thursday, November 23rd, 2006

I’ve put out SDbgExt 1.09. This is an incremental release of my collection of WinDbg debugger extensions.

The 1.09 release primarily adds support for displaying exception handler data on x64. While there is “some” built-in debugger support for this (via the “.fnent”) command, this support is extremely minimal. You are essentially required to dump the unwind data structures yourself and manually parse them out, which isn’t exactly fun. So, I added support for doing all of that hard work to SDbgExt, via the !fnseh SDbgExt extension (display function SEH data). This support is complementary to the !exchain command supplied by ext.dll for x86 targets.

The “!fnseh” command supports displaying most of the interesting fields of the unwind metadata (besides information on how the prologue works). It also properly supports chained unwind information records (both the documented and undocumented formats). There is also basic support for detecting and processing CL’s C/C++ exception scope tables, if a function uses C language exception handling (__try/__except/__finally).

Here’s a couple quick examples of it:

1: kd> !fnseh nt!CcPinRead
nt!CcPinRead L6a 2B,10 [ U ] nt!_C_specific_handler (C)
> fffff800012bf937 L2 (fffff800012fe4c0 -> fffff80001000000)
> fffff800012bf939 L16 (fffff800012fe4c0 -> fffff80001000000)
> fffff800012bf94f L5b (fffff800012fe4c0 -> fffff80001000000)
> fffff800012bf9aa L48 (fffff800012fe4c0 -> fffff80001000000)
> fffff800012c5199 Ld (fffff800012fe4c0 -> fffff80001000000)
> fffff800012c51a6 L58 (fffff800012fe4c0 -> fffff80001000000)
> fffff800012c51fe L1b (fffff800012fe4c0 -> fffff80001000000)
1: kd> !fnseh nt!CcCopyRead
nt!CcCopyRead Lae 3A,10 [E  ] nt!_C_specific_handler (C)
> fffff80001272c01 Lbf (fffff800012fe2e0 -> fffff800012c4c39)
> fffff80001272cc0 Lc (fffff800012fe2e0 -> fffff800012c4c39)
> fffff800012871f4 L2b (fffff800012fe2e0 -> fffff800012c4c39)
> fffff8000128721f L5a (fffff800012fe2e0 -> fffff800012c4c39)
> fffff800012961b1 L8 (fffff800012fe2c0 -> fffff800012c4b93)
> fffff800012961b9 L56 (fffff800012fe2c0 -> fffff800012c4b93)
> fffff800012c4aae Lcc (fffff800012fe2c0 -> fffff800012c4b93)
> fffff800012c4b7a L19 (fffff800012fe2c0 -> fffff800012c4b93)
1: kd> !fnseh nt!NtAllocateVirtualMemory
nt!NtAllocateVirtualMemory L5e 30,10 [E  ]
nt!_C_specific_handler (C)
> fffff8000103f74f L22 ( -> fffff8000103f771)
> fffff8000103f7f9 L16 ( -> fffff8000109adbf)
> fffff8000105ed3e L46 (fffff800010f3dc0 -> fffff8000105f173)
> fffff8000105f14d L22 ( -> fffff8000105f16f)
1: kd> !fnseh nt!KiSystemCall64
nt!KiSystemCall64 L390 50,0C [EU ]
nt!KiSystemServiceHandler (assembler/unknown)
0:000> !fnseh ntoskrnl + 00001180
401180 L2e 29,06 [  C] <none> (none)
 ntoskrnl!CcUnpinFileDataEx L32 13,05 [   ] <none> (none)

The basic output format is for unwind information as presented by the extension is as so:

EH-start-address LEH-effective-length prologue-size,unwind-code-count [unwind-flags (Exception handler, Unwind handler, Chained unwind information)] exception-handler (exception-handler-language)

Additionally, if the extension thinks that the function in question is using C language support, it will display each of the scope table entries as well (scope table entries divide up the various regions in a function that may have a __try or __except; there is typically only one lowest-level exception handler per function, with the scope table being used to implement multiple __try/__except clauses per function):

> EH-start-address LEH-effective-length (__except-filter-function (optional) -> __except-handler-function)

This information can be useful for tracking down exception filters and handlers to breakpoint on and the like, as EH registrations are completely isolated from code on x64. Note that not all functions may have exception or unwind handlers, as an unwind information is required to be provided for all x64 functions that modify the stack or call other functions.

For more information on how x64 exception handling works under the hood, you might look at my article on the subject, or skape’s paper about x64 Windows binary analysis.

Frame pointer omission (FPO) optimization and consequences when debugging, part 1

Tuesday, November 21st, 2006

During the course of debugging programs, you’ve probably ran into the term “FPO” once or twice. FPO refers to a specific class of compiler optimizations that, on x86, deal with how the compiler accesses local variables and stack-based arguments.

With a function that uses local variables (and/or stack-based arguments), the compiler needs a mechanism to reference these values on the stack. Typically, this is done in one of two ways:

  • Access local variables directly from the stack pointer (esp). This is the behavior if FPO optimization is enabled. While this does not require a separate register to track the location of locals and arguments, as is needed if FPO optimization is disabled, it makes the generated code slightly more complicated. In particular, the displacement from esp of locals and arguments actually changes as the function is executed, due to things like function calls or other instructions that modify the stack. As a result, the compiler must keep track of the actual displacement from the current esp value at each location in a function where a stack-based value is referenced. This is typically not a big deal for a compiler to do, but in hand written assembler, this can get a bit tricky.
  • Dedicate a register to point to a fixed location on the stack relative to local variables and and stack-based arguments, and use this register to access locals and arguments. This is the behavior if FPO optimization is disabled. The convention is to use the ebp register to access locals and stack arguments. Ebp is typically setup such that the first stack argument can be found at [ebp+08], with local variables typically at a negative displacement from ebp.

A typical prologue for a function with FPO optimization disabled might look like this:

push   ebp               ; save away old ebp (nonvolatile)
mov    ebp, esp          ; load ebp with the stack pointer
sub    esp, sizeoflocals ; reserve space for locals
...                      ; rest of function

The main concept is that FPO optimization is disabled, a function will immediately save away ebp (as the first operation touching the stack), and then load ebp with the current stack pointer. This sets up a stack layout like so (relative to ebp):

[ebp-01]   Last byte of the last local variable
[ebp+00]   Old ebp value
[ebp+04]   Return address
[ebp+08]   First argument...

Thereafter, the function will always use ebp to access locals and stack based arguments. (The prologue of the function may vary a bit, especially with functions using a variation __SEH_prolog to setup an initial SEH frame, but the end result is always the same with respect to the stack layout relative to ebp.)

This does (as previously stated) make it so that the ebp register is not available for other uses to the register allocator. However, this performance hit is usually not enough to be a large concern relative to a function compiled with FPO optimization turned on. Furthermore, there are a number of conditions that require a function to use a frame pointer which you may hit anyway:

  • Any function using SEH must use a frame pointer, as when an exception occurs, there is no way to know the displacement of local variables from the esp value (stack pointer) at exception dispatching (the exception could have happened anywhere, and operations like making function calls or setting up stack arguments for a function call modify the value of esp).
  • Any function using automatic C++ objects with destructors must use SEH for compiler unwind support. This means that most C++ functions end up with FPO optimization disabled. (It is possible to change the compiler assumptions about SEH exceptions and C++ unwinding, but the default [and recommended setting] is to unwind objects when an SEH exception occurs.)
  • Any function using _alloca to dynamically allocate memory on the stack must use a frame pointer (and thus have FPO optimization disabled), as the displacement from esp for local variables and arguments can change at runtime and is not known to the compiler at compile time when code is being generated.

Because of these restrictions, many functions you may be writing will already have FPO optimization disabled, without you having explicitly turned it off. However, it is still likely that many of your functions that do not meet the above criteria have FPO optimization enabled, and thus do not use ebp to reference locals and stack arguments.

Now that you have a general idea of just what FPO optimization does, I’ll cover cover why it is to your advantage to turn off FPO optimization globally when debugging certain classes of problems in the second half of this series. (It is actually the case that most shipping Microsoft system code turns off FPO as well, so you can rest assured that a real cost benefit analysis has been done between FPO and non-FPO optimized code, and it is overall better to disable FPO optimization in the general case.)

Update: Pavel Lebedinsky points out that the C++ support for SEH exceptions is disabled by default for new projects in VS2005 (and that it is no longer the recommended setting). For most programs built prior to VS2005 and using the defaults at that time, though, the above statement about C++ destructors causing SEH to be used for a function (and thus requiring the use of a frame pointer) still applies.

You can open a PE image as a dump file with WinDbg

Friday, November 17th, 2006

There is a little known feature of WinDbg, ntsd, cdb, kd, and anything else that uses DbgEng to open dump files.

It turns out that with anything powered by DbgEng, anywhere where you could open a dump file (user dump, kernel dump, etc), you can instead open a PE image (.exe/.dll/.sys/etc) and have the debugger treat it as a dump containing just the contents of the selected PE image.

This is actually a relatively useful feature. When you open a PE image as a dump file, the debugger maps it as an image as if it were loaded in-memory as executable code (though it doesn’t actually run any code, just maps it as if it were an executable and not a data file). This gets you an in-memory representation of your exe/dll/sys/other PE file as if you were debugging a live process (or a dump) that had the image in question loaded.

Like a dump debugging session, this is essentially a read-only session; you can’t really modify anything, as there is no target to control. Additionally, there is no real register context either (or stack or heap), although things like initialized and zero filled global variables and executable code belonging to the module will be in-memory. (The preferred image base for the module is used in this situation for basing the requested PE module in the virtual address space constructed for the debugging session.)

After you have loaded the target, you can do anything that you would normally do with a dump for the most part, as far as examining symbols and disassembling the target go. If you need a disassembler with symbol support and can’t start a process or whatnot to contain a PE image, this particular trick is a great quick-n-dirty replacement for a more full-featured disassembler program.

Note that a side effect of opening a PE image in dump mode is that the symbol server is used to retrieve the binary (which might seem a bit strange, until you consider that for dump files, the normal case is that you don’t have the entire binary saved in memory; just enough header information to retrieve the binary from the symbol server). Therefore, make sure that your symbol path is setup correctly before trying this particular trick.

Win32 calling conventions review

Friday, November 10th, 2006

Recently, I’ve posted about the Win32 calling conventions. Here’s a table of contents of the various different posts I’ve made.

  1. Win32 calling conventions: Concepts
  2. Win32 calling conventions: Usage cases
  3. Win32 calling conventions: __cdecl in assembler
  4. Win32 calling conventions: __stdcall in assembler
  5. Win32 calling conventions: __fastcall in assembler
  6. Win32 calling conventions: __thiscall in assembler

Remember that when picking a calling convention to use, there are a number of factors to consider. There is no one calling convention that fits all cases (however, __stdcall is a good default if you are not sure).

Hopefully, you’ll have found this series to be enlightening, useful, and practically applicable.

Debugger flow control: Using conditional breakpoints (part 3)

Thursday, November 9th, 2006

Previously, I had touched some more on the when and why’s as to where hardware breakpoints can be useful.

If you have been following along so far, you should already know the ups and downs of each flavor of breakpoint, and have at least a fair idea as to when you should prefer one to another. There is one other aspect of breakpoint management that I have yet to cover, though, and it is perhaps the most useful feature of breakpoints in WinDbg: conditional breakpoints.

Conditional breakpoints allow you to, as you might imagine, set conditions for breakpoints. That is, the debugger will only actually stop for you to investigate something when both the breakpoint is triggered, and its associated condition is met. These kinds of breakpoints are very useful if you need to stop on a certain function, but only if a certain argument has a certain value (for instance).

However, in WinDbg (and the other DTW/DbgEng debuggers), the support for conditional breakpoints allows you to do much more than that. Specifically, you are permitted to define arbitrary commands that are automatically executed any time a breakpoint is hit. Aside from allowing you to create conditional breakpoints, this also allows you to perform a number of other highly useful tasks in a quick and automated fashion with the debugger. For example, you might want to alter arguments passed to a particular function, or you might want to log arguments (or a stack trace) to a particular function for future analysis.

For example, let’s say that you wanted to see anyone who called CreateFileW, which filename they provided, and what the call stack for each caller might be, what the return value is, and then continue execution. Now, you could do this manually with a “plain” breakpoint and repeating a certain set of commands every time the breakpoint hits, but it would be far superior to automate the whole process.

With DbgEng’s conditional breakpoint support, this is easy. All you need to do in order to have a set of commands that are executed after a breakpoint is to follow the breakpoint statement with a set of commands enclosed in double quotes. (If the commands themselves require the use of double quotes, then you’ll have to escape those, using \”).

From looking at CreateFile on MSDN, we can see that the first argument is a unicode string describing the name of the file that we are to create.

Armed with this information, we can construct a breakpoint that will perform the logging that we are looking for.

Here’s what you might come up with:

0:001> bp kernel32!CreateFileW "du poi(@esp+4);kv;gu;? @eax; g"

Let’s break down this breakpoint a little bit. As usual, the semicolon character (;) is used to separate multiple debugger commands appearing on the same line. The first command is fairly straightforward; it makes use of the du command to display the first argument. The du command simply displays a zero-terminated Unicode string at a given address.

Next, we take a full backtrace (k). After that, we allow execution to continue until we reach the return address of CreateFileW with the gu command (“Go Up one call level”). Finally, we display the eax register (which happens to be the return value register for x86), and continue execution with g.

In action, you might see something like so. In this instance, I am attached to cmd.exe and have executed the command “type c:\config.sys”…

0013fa4c  "c:\\CONFIG.SYS"
ChildEBP RetAddr  
0013e6e4 4ad02f2a kernel32!CreateFileW
0013e730 4ad02e91 cmd!Copen_Work+0x157
0013e744 4ad0dbff cmd!Copen+0x12
0013f7a8 4ad0db62 cmd!TyWork+0x48
0013fc58 4ad0daac cmd!LoopThroughArgs+0x1dd
0013fc6c 4ad05aa2 cmd!eType+0x17
0013fe9c 4ad013eb cmd!FindFixAndRun+0x1f5
0013fee0 4ad0bbba cmd!Dispatch+0x137
0013ff44 4ad05164 cmd!main+0x216
0013ffc0 7c816fd7 cmd!mainCRTStartup+0x125
0013fff0 00000000 kernel32!BaseProcessStart+0x23
Evaluate expression: 132 = 00000084

Essentially, we have turned the debugger into something to inspect function calls and provide us with detailed information about what is happening, in a completely automated fashion.

You can also use this technique for more active intervention in the target as well – for instance, skipping over a function call entirely, modifying what a function does when it is called, or any number of things. Previous articles have, for instance, used conditional breakpoints to alter function behavior (such as making all new virtual memory allocations come from the high end of the address space instead of the low end).

Conditional breakpoints are an invaluable tool in your debugging (and reverse engineering) arsenal; you should absolutely consider them any time that you need any sort of automation to record or alter the behavior of the target, in situations where it is not practical to manually perform the work on every single breakpoint hit.

Next up in this series: Other flow control mechanisms with the debugger, such as stepping and tracing.

Debugger flow control: More on breakpoints (part 2)

Wednesday, November 8th, 2006

Last time, I discussed the two types of breakpoints that you’ll see in a debugger (hardware and sfotware) at a high level. I didn’t really explain when it was best to use one instead of the other, though, besides a couple hints relating to hardware breakpoints being limited in number and good for tracking down memory corruption issues at times.

By taking a closer look at how each of the two breakpoints work, we can get some idea as to when we’ll prefer one to another. Both types of breakpoints alter the target in some way, but to differing degrees.

The primary concern with software breakpoints is that they actually involve patching memory in the target to set the breakpoint. This is usually fine; the debugger uses it as its default breakpoint strategy when you give an end address to g, for instance. However, it begins to break down if the target both executes a region that you are setting a breakpoint on, and also reads that same region.

This particular concern is a real problem when you are dealing with self-modifying code, or certain protection schemes (such as code that attempts to checksum itself in memory). In these cases, you might accidentally break self-modifying code, or trip a protection scheme, simply by virtue of setting a breakpoint (since the act of setting a software breakpoint actually modifies the address space of the target).

In cases like this, hardware breakpoints can come to the rescue. Since setting an execute hardware breakpoint does not actually modify the underlying instruction, anything that reads the memory backing that instruction will not get back an 0xCC opcode instead of the real first byte of the instruction opcode. Granted, you can only have four enabled hardware breakpoints at a time, but usually you can get by with that many (or at least, a “sliding window” of hardware breakpoints, assuming you have breakpoints over a well-defined execution sequence. In this case, you could have breakpoints disable themselves and enable the next breakpoint, thus conserving the number of active breakpoints).

There is also the other obvious advantage to hardware breakpoints which I touched on earlier: the ability to set a breakpoint on a memory fetch for a particular address. This obviously has a great deal of uses, whether you’re reverse engineering something or are tracking down a corruption problem. Memory-access breakpoints are an excellent way to very quickly figure out which piece of code is modifying a variable, without having to trace through an arbitrarily large set of code to find the access that you were looking for. One thing to consider about memory-access breakpoints on x86 and x64, though, is that there is only support for setting memory-access breakpoints on regions that are 1) a power of 2 in length, and 2) have a length that is less than or equal to the native pointer size (8 for x64, or 4 for x86). (If you are lucky enough [or perhaps unlucky enough, as Itanium isn’t exactly the most friendly thing to view from an assembler perspective] to be debugging on an Itanium platform, this restriction does not exist; you can set a length of any power of 2 between 1 byte and two gigabytes). As a result, you’ll have to plan where to set your breakpoints carefully, as on x86, you can only cover at most 16 bytes with this kind of “memory guard” access. You might or might not be able to use the same kind of “sliding breakpoint window” idea I mentioned above, if the memory locations you are setting breakpoints on are accessed in a particular sequence (or at least, the accesses that you are interested in).

Hardware breakpoints are typically less invasive than software breakpoints, but there are still ways that they can be interfered with. The most common case for this happening is if you try to set a hardware breakpoint while DLL initializers are being called during process startup (such as at the initial create process breakpoint). If you try to do this, you’ll get a warning from the debugger advising you that your breakpoints won’t stick:

0:000> ba e1 kernel32!CreateFileA
        ^ Unable to set breakpoint error
The system resets thread contexts after the process
breakpoint so hardware breakpoints cannot be set.
Go to the executable's entry point and set it then.
 'ba e1 kernel32!CreateFileA'

The reason why this is the case is that there is a context set that occurs between the initial process breakpoint being hit and the requested thread start address / process start address being executed. I’ll go into just how this works at process startup in a future posting, but to keep it simple, the basic idea is that an APC is queued to the new usermode thread that runs the loader component in NTDLL. One of the arguments to the APC is a context record describing the register context that was requested for the new thread by CreateProcess, CreateThread, and soforth. The loader component runs process (or thread) DLL initializers, and then calls NtContinue to continue execution at the specified context record, which kicks off execution at the user requested thread start address. We can see this in action easily by looking at the arguments that the APC dispatcher supplies to the loader initializer APC:

0:000> kv
ChildEBP RetAddr  Args to Child              
0013fb1c 7c93edc0 7ffdf000 ntdll!DbgBreakPoint
0013fc94 7c921639 0013fd30 ntdll!LdrpInitializeProcess+0xffa
0013fd1c 7c90eac7 0013fd30 ntdll!_LdrpInitialize+0x183
00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
0:000> .cxr 0013fd30 
eax=4ad05056 ebx=7ffdd000 ecx=00f2faa8 edx=00090000
esi=7c9118f1 edi=00011970
eip=7c810665 esp=0013fffc ebp=7c910570 iopl=0
         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000
             efl=00000200
kernel32!BaseProcessStartThunk:
7c810665 33ed            xor     ebp,ebp

If you have been paying attention so far, it should be clear why hardware breakpoints set at the initial process breakpoint do not appear to work like you might expect when you set them at the initial process breakin: When the APC that runs loader initializers returns, it restores a previously saved register context image via NtContinue. Since hardware breakpoints are part of the register context, they are wiped away after the context is restored, and so your breakpoints would appear to simply disappear after DLL initializers were finished.

This limitation also implies that calling SetThreadContext on a thread can interfere with hardware breakpoints if care is not taken to preserve the value of the Dr series of registers. Indeed, some protection schemes utilize such a trick in an attempt to defeat hardware breakpoints.

Fortunately, it is easy to work around such limitations using the debugger. There is a little-used command called “.apply_dbp” that allows you to instruct the debugger that it should re-apply hardware breakpoints, either to the current register context, or a saved register context image in-memory (supplied by the /m Context argument). With the use of this command, you can quickly restore your hardware breakpoints even after something attempts to trash them. Combined with a conventional breakpoint on, say, kernel32!SetThreadContext, this can be used to quickly re-enable the use of hardware breakpoints on such cases. You can also use this trick to persist hardware breakpoints in the process startup case, by using .apply_dbp /m <address-of-context-record-argument-from-APC-dispatcher> to enforce any hardware breakpoints you set in the register context image that will eventually be restored by NtContinue. For instance, in the case of the example that I gave above, you might use the following to apply hardware breakpoints to the context that NtContinue will restore:

0:000> .apply_dbp /m 0013fd30 
Applied data breakpoint state

Next up, some more tricks that you can do to get the most out of controlling the target in the debugger.

Debugger flow control: Hardware breakpoints vs software breakpoints

Tuesday, November 7th, 2006

In debugging parlance, there are two kinds of breakpoints that you may run across – “hardware” breakpoints, and “software breakpoints”. While the two overlap to a certain degree, it is important to know the differences between the two, and when it is better to use a “hardware” or “software” breakpoint.

For the purposes of this discussion, I’ll stick to using WinDbg on an x86 target. The same general concepts apply to other architectures (especially x64, which works near identically), and the commands to set breakpoints are the same, but details such as where and how many hardware or software breakpoints you may set slightly vary from platform to platform.

In most debugging scenarios, you have probably just used software breakpoints exclusively. Software breakpoints are issued by the bp or bu commands (breakpoint and deferred breakpoint, respectively). These breakpoints are fairly simple and straightforward; they cause the processor to halt in the debugger whenever a thread attempts to execute a piece of code that you set a breakpoint on. Typically, you may set any number of software breakpoints that you want at the same time. Software breakpoints may only be targetted at code; there is no support for setting a “memory breakpoint” via a software breakpoint. Many features such as stepping over a call or going to the return address of a function also implicitly use a temporary software breakpoint that is removed once execution hits it the first time.

Hardware breakpoints, on the other hand, are much more powerful and flexible than software breakpoints. Unlike software breakpoints, you may use hardware breakpoints to set “memory breakpoints”, or a breakpoint that is fired when any instruction attempts to read, write, or execute (depending on how you configure the breakpoint) a specific address. (There is also support for setting breakpoints on I/O port access, but I’ll not cover that feature here, as it is typically of very limited applicibility for every-day debugging tasks.) Hardware breakpoints have some limitations, however; the main limit being that the number of hardware breakpoints that you may have active is extremely limited (on x86, you may only have four hardware breakpoints active at the same time).

Now that we have a basic overview of what the two breakpoint types are, let’s dig a bit deeper and see how they work under the hood, and when you might use them.

The way software breakpoints work is fairly simple. Speaking about x86 specifically, to set a software breakpoint, the debugger simply writes an int 3 instruction (opcode 0xCC) over the first byte of the target instruction. This causes an interrupt 3 to be fired whenever execution is transferred to the address you set a breakpoint on. When this happens, the debugger “breaks in” and swaps the 0xCC opcode byte with the original first byte of the instruction when you set the breakpoint, so that you can continue execution without hitting the same breakpoint immediately. There is actually a bit more magic involved that allows you to continue execution from a breakpoint and not hit it immediately, but keep the breakpoint active for future use; I’ll discuss this in a future posting.

Now, you might be tempted to say that this isn’t really how software breakpoints work, if you have ever tried to disassemble or dump the raw opcode bytes of anything that you have set a breakpoint on, because if you do that, you’ll not see an int 3 anywhere where you set a breakpoint. This is actually because the debugger tells a lie to you about the contents of memory where software breakpoints are involved; any access to that memory (through the debugger) behaves as if the original opcode byte that the debugger saved away was still there.

Now that we know how software breakpoints work at a high level, it’s time to talk about the other side of the story, hardware breakpoints.

Hardware breakpoints are, as you might imagine given the name, set with special hardware support. In particular, for x86, this involves a special set of perhaps little-known registers know as the “Dr” registers (for debug register). These registers allow you to set up to four (for x86, this is highly platform specific) addresses that, when either read, read/written, or executed, will cause the processor to throw a special exception that causes execution to stop and control to be transferred to the debugger.

Given that on x86, you can only have four hardware breakpoints active at once, why would anyone possibly want to use them?

Well, the main strength of hardware breakpoints is that you can use them to halt on non-execution accesses to memory locations. This is actually an extremely useful capability; for example, if you were debugging a memory corruption problem where an initial instance of corruption eventually causes a crash, your initial reaction would probably be something on the lines of “gee, if I know who caused the corruption in the first place, this would be much, much easier to debug” – and this is exactly what hardware breakpoints let you do. In essence, you can use a hardware breakpoint to tell the processor to stop when a specific variable (address) is read or read/written to. You can also use hardware breakpoints to break in on code execution as well, although in the typical case, it is more common to use software breakpoints for that purpose due to the relaxed restrictions on how many breakpoints you may have active at once.

That’s the high level overview of the two main types of breakpoints you’ll encounter in a debugger. In some upcoming postings, I’ll go into some specifics as to how certain edge cases (such as stepping over a call) are implemented, and describe other situations where you’ll find it very useful to use one kind of breakpoint instead of another. I am also planning on discussing how some of the other debugger flow control features are really implemented (such as tracing / single step), and what the consequences of using each flow control method are on the debuggee.