Archive for August, 2006

Setting a default symbol path for all DbgHelp programs

Thursday, August 10th, 2006

If you’re like me, then you have lots of programs you use on a regular basis which utilize DbgHelp to manage symbol access.  Unfortunately, each of these programs typically has its own convention for managing a persisted symbol path, and it can get to be very annoying to update the paths for every single program you use when you need to change your symbol path.

Fortunately, there is a solution:  A little-used (nowadays) environment variable called “_NT_SYMBOL_PATH”.  DbgHelp will add this to the list of symbol paths searched for symbols automatically, without any special configuration needed by programs using it (in most cases; it is possible to programmatically suppress this behavior, but most programs do not do this as it is a bad idea to block this behavior).

How does this environment variable help you consolidate your symbol paths?  Well, there is an option in the System property sheet (under Control Panel) to set default environment variables for your user or the local machine.  You can use this to set a default value for _NT_SYMBOL_PATH for your user, which means that all of your programs using DbgHelp will now have a default symbol search path.

You can give _NT_SYMBOL_PATH any value that you could pass to “.sympath” or save in a WinDbg workspace, including symbol server references.

Note that if you give _NT_SYMBOL_PATH a symbol server reference, you need to make sure that programs using DbgHelp can find symsrv.dll, which means placing it in your path if those programs do not already have a copy in their directories.

Checking if a binary exists on a symbol repository

Wednesday, August 9th, 2006

This question came up on the microsoft.public.windbg newsgroup and turned out to be more complicated to solve than it might appear, so I’m posting about it here.

Someone asked if there is a way to use the DTW tools to detect if a binary is present on a symbol repository.  Now, you might naively assume that you can just use symchk.exe on the binary and it will work, and indeed if you try this, you might be fooled into thinking that it does work.  However, it really doesn’t – if you do this, all that symchk will do is verify that the symbols for the binary are on the symbol repository.

If you need to make sure that the binary is there (i.e. so you can do dump file debugging using the symbol repository), then you need to use the “/id filename” command line parameter with symchk.  This tells symchk that you want to verify that symbols for a dump file exist on the symbol server.  Because DbgEng lets you load a single PE image (.dll/.exe/.sys/etc) as a dump file, and because dump files require loading the actual binary itself from the symbol path, this forces symchk to verify that the binary (and not just the symbols) exist on the symbol repository.

Debugging a custom unhandled exception filter

Tuesday, August 8th, 2006

If you are working on a custom unhandled exception filter (perhaps to implement your own crash dump / error reporting mechanism), then you have probably run into the frustrating situation where you can’t debug the filter itself in case there is a crash bug in it.

The symtoms are that when a crash occurs, your crash reporting logic doesn’t work at all and the process just silently disappears.  When you attach a debugger and reproduce the crash to figure out what went wrong, the debugger keeps getting the exceptions (even if you continue after the first chance exception), and your unhandled exception filter is just never called.

Well, the reason for this is that UnhandledExceptionFilter tries to be clever and hides any last chance exception filter handling when a debugger is active.  Here’s why: the default implementation of the unhandled exception filter launches the JIT debugger and then restarts the exception.  When the JIT debugger attaches itself and the exception is restarted, you clearly want the exception to go to the debugger and not the unhandled exception filter, which would try to launch the JIT debugger again.

If, however, you have a custom unhandled exception filter that is crashing, then you probably don’t want this behavior so that you can debug the problem with the exception filter.  To disable this behavior and let the exception filter be called even if there is a debugger attached, you will need to patch kernel32!UnhandledExceptionFilter in a debugger.  If you unassemble it, you should eventually see a call to NtQueryInformationProcess that looks like this:

NtQueryInformationProcess(GetCurrentProcess(), 7, &var, 4, 0)

…followed by a comparison of that local variable passed in to NtQueryInformationProcess against 0.  You will need to patch the comparison to treat the local set by NtQueryInformationProcess as if it were zero, perhaps turning it into an unconditional jmp with the “eb” or “a” commands.

This comparison is checking for something called the “process debug port”, which is a handle to an LPC port that is used to communicate with the debugger attached to the current process.  The kernel returns a null handle value if there is no debugger attached, which is how UnhandledExceptionFilter knows whether to work its magic and forward the exception on to the debugger or call the registered unhandled exception filter.

After patching out the check, then exceptions should be forwarded to your unhandled exception filter as if no debugger were there, giving you a chance to inspect the operation of your crash handling code.

Save time when you are debugging by rebasing your DLLs

Monday, August 7th, 2006

If you are working with a process that has many of your DLLs loaded in it, and your DLLs tend to be loaded and unloaded dynamically, then you can sometimes save yourself a lot of trouble when debugging a problem by making sure that each of your DLLs has a unique base address.

That way, if you have a bug where one of your DLLs is called after unloaded, you can easily figure out which DLL the call was supposed to go to and which function it should have gone to by loading the DLL using the dump file loading support (start WinDbg as if you were going to debug a dump file, but select the .DLL in question instead of a .dmp file) and unassembling the address that was referenced.

On Windows Server 2003 and later, NTDLL maintains a list of the last few unloaded modules and their base addresses in user mode (accessible using “lm” in WinDbg) which can make debugging this kind of problem a bit more manageable even if you don’t rebase your DLLs, but rebasing is easy, improves loading performance (and especially scalability under Terminal Server), so I would highly recommend going the rebasing route anyway.

If you weren’t on Windows Server 2003 and didn’t rebase your DLL, then chances are the loader relocated it at load time to some not-predictable location, which makes finding the actual DLL being called and which function / global / etc in the DLL was being referenced when the crash occured much more difficult than if the DLL always loads at its preferred base address and you can simply look up the address in the DLL directly.

Using the symbol proxy in cross-domain scenarios with UncAccessFilter

Friday, August 4th, 2006

If you have a symbol proxy setup and you need to have it talk to a symbol server path that references a UNC share on a server that isn’t on the same domain as the IIS webserver hosting symproxy, then you may need to do some hackery to get the symbol proxy to work properly without prompting for credentials to use on the network.

 Specifically, the problem here is that there is no way to tell IIS to try and map a UNC share with a particular username and/or password before processing a request unless the request itself points to a network share.

One way to work around this is to use a simple ISAPI filter that I wrote (UncAccessFilter) to make sure any required UNC paths are mapped before the symproxy ISAPI filter is called.  After installing the ISAPI filter in the usual way, make sure that it is prioritized above the symproxy filter.  To configure it, you will need to manually set up some values in the registry.

 Create the key “HKEY_LOCAL_MACHINE\Software\Valhalla’s Legends\Skywing\UncAccessFilter” and ensure that the user web requests will be running in has read access to it.  You will probably want to ensure that only the web access user, administrators, and the system account have read access to this key because it will have passwords stored in it (be aware of this as a potential security risk if someone gets access to the registry key, as the passwords are not obfuscated in any way).  Then, for each share, create a REG_SZ value whose name is the share path you want to map (e.g. \\fileserver\fileshare) and whose contents are of the format “username;password”, for instance, “fileserver\symbolproxyuser;mypassword”.

To debug the filter, you can create a REG_DWORD value in that key named “DebugEnabled” and set it to 1, in which case the IIS worker process under which the ISAPI filter is running in will do some diagnostic OutputDebugString calls about what operations it is performing if you have a debugger attached to the process.  Assuming you configured the filter properly, on startup, you should see a series of messages listing the configured UNC shares (you may need to attach to the svchost process that creates the w3wp worker processes and use `.childdbg 1′ to catch this message for the new worker processes on startup).

If you are using the prebuilt binaries then make sure to install the VC++ 8 runtimes on the IIS server first.  Note that the prebuilt binaries are 32-bit only at this time, you’ll need to rebuild the ISAPI filter from source if you want to use the filter in 64-bit mode.

Be aware that the ISAPI filter is fairly simple and is not extraordinarily robust (and may be a bit slow if you have high traffic volumes, since it enumerates mapped network shares on every incoming request).  Additionally, be aware that if one of the servers referenced in the registry is down, it can make web requests that you have configured to be filtered by UncAccessFilter take a long time as the filter tries unsuccessfully to reconnect to the configured share on that server.  However, when properly configured, it should get the job done well enough in most circumstances.

Note that if you can get away with using the same account for all of your shares, a better solution is to simply change the account the web application associated with the symbol proxy is running under.  If you need to use multiple accounts, however, this doesn’t really do what you need.

Update: It would help if I had posted the download url.

Remote debugging review

Thursday, August 3rd, 2006

Over the past week or two, I’ve written about some of the various remote debugging options available to you through the Debugging Tools for Windows (DTW) package.  I’ve covered most of the major debugging mechanisms available at this point and given brief descriptions of their strengths and weaknesses.

Here’s an indexed listing of the posts on this topic so far:

  1. Overview of WinDbg remote debugging
  2. Remote debugging with remote.exe
  3. Remote debugging with KD and NTSD
  4. Remote debugging with -server and -remote
  5. Reverse debugging -server and -remote
  6. Securing -server and -remote remote debugging sessions
  7. Remote debugging with process servers (dbgsrv)
  8. Activating process servers and connecting to them
  9. Remote debugging with kdsrv.exe
  10. Remote debugging review

At this point, you should be able to use all of the above remoting mechanisms in their basic usage cases.  There are a couple of obscure features that I did not cover, such as doing -server/-remote over serial ports, but between my posts and the documentation you should be able to figure out what to do if you ever find a use for such estorica (let me know if you do!).  What remains to be told is some general advice on which remoting mechanism is the best for a particular problem.

In general, the most important factors in choosing a remoting mechanism are:

  • Available bandwidth and latency between your computer and the remote system.  Some remoting mechanisms, like dbgsrv, perform very poorly without a high bandwidth, low latency link.
  • Whether symbol access needs to be done on the client or the debugging target.  This consideration is important if you are debugging a problem on a customer site.
  • What types of targets you need to support.  Some mechanisms, such as process servers, do not support all target types (for instance, lack of dump file debugging support).
  • Whether you need special support for working through a firewall (i.e. reverse connection support).
  • Ease of use with respect to setting up the remoting session.

These are the general factors I use to decide which remoting mechanism to use.  For example, in ideal cases, such as debugging a problem on a LAN or on a virtual machine hosted on the same computer, I will almost always use a process server for remote debugging, simply because it lets me keep my own WinDbg workspace settings and symbol access without having to set up anything on the target computer.  Over the Internet, process servers are usually too slow, so I am often forced to fall back to -server/-remote style remoting.

Taking into account the guidelines I mentioned above, here are the major scenarios that I find useful for each particular remoting mechanism:

  • Process servers and smart clients (dbgsrv).  This is the remote debugging mechanism of choice for remotely debugging things on virtual machines, on a LAN or other fast connection, or even on the same computer (which can come in handy for certain Wow64 debugging scenarios, or cross-session debugging under Terminal Server prior to Windows XP).  Process server debugging is also useful for debugging early-start system services remotely, where the intrastructure to do symbol access (which touches many system components, for things like authentication support) is not yet available – for this scenario, you can use the “-cs command-line” parameter with dbgsrv to start a target process suspended when you launch dbgsrv, which is handy for using Image File Execution Options to have dbgsrv act as a debugger for early start services.  This can be more reliable than -server and -remote if you are trying to do symbol access, as if you are debugging certain services, you might deadlock the debugger and lose the debugging session if the debugger has to talk to the service you are debugging in order to complete a network symbol access request.
  • -server and -remote.  If I am doing debugging over the Internet, I’ll usually use this mechanism as it’s relatively quick even over lower quality connections.  This mechanism is also useful for collaborating with a debugging session (for instance, if you want to show someone how to perform a particular debugging task), as you can have multiple users connect to the same debugging session.  Additionally, -server/-remote are handy if you have a large dump file on a customer site and you want to debug it remotely instead of copying it to your computer, but would like to do so from the context of your local computer so that you have easier access to source code and/or documentation.  Finally, -server/-remote support remote kernel debugging where process servers do not.
  • KdSrv.exe.  If you need to do remote kernel debugging over a LAN, this is the mechanism of choice.  Be aware that kernel debugging is even more latency and bandwidth sensitive than process servers, making this mechanism useless unless you have a very fast, LAN-like connection to the target.  If these conditions hold true, KdSrv.exe provides the main benefits that a process server does for user mode debugging; local symbol access, local debugger extensions, and allowing the debugger to use your workspace settings on the local computer as opposed to setting up your UI on a remote system.
  • NTSD through KD.  This is useful in a couple of specialized scenarios, such as debugging very early start processes or performing user mode debugging in conjunction with the kernel debugger.  While controlling NTSD through KD is much less convenient than through a conventional remote debugging session, you won’t have to worry about your session going away or getting disconnected while the remote system is frozen in the kernel debugger.  In particular, this is useful for doing things like debugging things that make calls to the LSA from kernel mode, or other situations where kernel mode code you are debugging is extensively interacting with user mode code.
  • Remote.exe.  I have never really found a situation that justifies the use of this as a preferred remoting mechanism, as its capabilities are far eclipsed by the other remoting services available and the benefits (low network utilization) are relatively minimal compared to -server/-remote in today’s world of cable modem and xDSL connections.

If you are debugging a problem on a customer site, you will likely find reverse connection debugging highly useful.  All of the modern remote debugging mechanisms support reverse connections except NTSD over KD, for obvious reasons.

Another consideration to take into account when selecting which mechanism to use is that you can mix and match multiple remoting mechanisms within a debugging session if it makes sense to do so.  For instance, you can start a process server, connect to it with ntsd, and launch a -server/-remote style server with “.server” that you then connect to with WinDbg.  This capability is usually not terribly useful, but there are a couple of instances where it can come in handy.

That’s all for this series on remote debugging.  I may come back and revisit this topic again in the future, but for the moment, I’ll be focusing on some different subjects for upcoming posts.

Why you shouldn’t touch things in DllMain

Wednesday, August 2nd, 2006

One topic that comes up on the Microsoft newsgroups every once and awhile is whether it is really that bad to be doing complicated things in DllMain.

The answer I almost always give is yes, you should always stay away from that.

This is a particularly insidious topic, as many people do things in DllMain anyway, despite MSDN’s warnings to the contrary, see that it seems to work on their computer, and ship it in their product / program / whatever.  Unfortunately, this often ends up with hard to debug problems that only fail on a particular customer computer – the kind that you really don’t want to get stuck debugging remotely.  The reason for this is that many of the things that can go wrong in DllMain are environment specific.  This is because depending on whether a particular DLL that you are calling inside of DllMain when you break the rules is loaded your DLL was loaded or not will often make the difference.

If you dynamically load a DLL in DllMain and it has not already been loaded yet, you will get back a valid HMODULE, but in reality the initializer function for the new DLL will not be called until after your DllMain returns.  However, if the DLL had already been loaded by something else and your LoadLibrary call just incremented a reference count, then DllMain has already been called for the DLL.  Where this gets ugly is if you call a function that relies on some state setup by DllMain, but on your development/test boxes, the DLL in question had already been loaded for some reason.  If on a customer computer, you end up being the first to load the DLL, you’ll have mysterious corruption and/or crashes resulting from this which never repro in the lab for you.

So, stay away from complicated things in DllMain.  There are other reasons too, but this is the big one for current OS releases (of course, Vista and future versions may add other things that can go wrong if you break the rules).

If you are interested, Michael Grier has an excellent series on this topic to help you understand just what can go wrong in DllMain.

Don’t forget to turn off your debug prints when you ship your product

Tuesday, August 1st, 2006

One thing that really annoys me when I am debugging a problem is when people ship their products with debug prints on in the release versions.

This just sucks, it really does.  It’s hard to pay attention to debug prints for things that matter if half of the third party software on your computer is compiled with debug prints enabled.  One example of a particularly annoying offender of this is the HGFS (host-guest filesystem) network filesystem provider shipped by VMware.  Now, I love VMware, but it’s really, really annoying that every single VM in existance with VMware Tools installed has built in debug print spam from every process that touches the network provider stack.

So, change those DbgPrint calls to KdPrint if you are working on a driver, and if you’re in user mode, make sure that OutputDebugString calls aren’t compiled in if you are in release mode.  Alternatively, leave them there but make sure that they are off by default unless you set a special configuration or registry parameter.