Common WinDbg problems and solutions

When you’re debugging a program, the last thing you want to have to deal with is the debugger not working properly. It’s always frustrating to get sidetracked on secondary problems when you’re trying to focus on tracking down a bug, and especially so when problems with your debugger cause you to lose a repro or burn excessive amounts of time waiting around for the debugger to finish doing who knows what that is taking forever.
This is something that I get a fair amount of questions about from time to time, and so I’ve compiled a short list of some common issues that one can easily get tripped up by (and how to avoid or solve them).

  1. I’m using ntsd and I can’t get symbols to load, or most of the debugger extension commands (!commands) don’t work. This usually means that you launched the ntsd that ships with the operating system (prior to Windows Vista), which is much older than the one shipping with the debugger package. Because it is in the system directory, it will be in your executable search path.

    To fix this problem, use the ntsd executable in the debugger installation directory.

  2. WinDbg takes a very long time to process module load events, and it is using max processor time (spinning) on one CPU. This typically happens if you have many unqualified breakpoints that track module load events (created via bu) saved in your workspace. This problem is especially noticible when you are working with programs that have a very large number of decorated C++ symbols, such as debug builds of programs that make heavy use of the STL or other template classes. Unqualified breakpoints are expensive in general due to forcing immediate symbol loads of all modules, but moreover they also force the debugger to undecorate and perform pattern matches against every symbol in a module that is being loaded, for every unresolved breakpoint.

    If you allow a large number of unqualified breakpoints to become saved in a default workspace, this can make the debugger appear to be extremely slow no matter what program you are debugging.

    To avoid getting bitten by this problem, don’t use unqualified breakpoints (breakpoints without a modulename! prefix on their address expression) unless absolutely necessary. Also, it’s typically a good idea to clear all your breakpoints before you save your workspace if you don’t need them to be saved for your next debugging session with that debugger workspace (by default, bu breakpoints are persisted in the debugger workspace, unlike bp breakpoints which go away after every debugging session). If you are in the habit of saving the workspace every time you attach to a running process, and you often use bu breakpoints, this will tend to clutter up the user default workspace and can quickly lead to very poor debugger performance if you’re not careful.

    You can use the bc command to delete breakpoints (bc * to remove all breakpoints), although you will need to save the workspace to persist the changes. If the problem has gotten to the point where it’s not possible to even get past module loading in a reasonable amount of time so that you can use bc * to clear out saved breakpoints, you can remove the contents of the HKCU\Software\Microsoft\Windbg\Workspaces registry key and subkeys to return WinDbg to a pristine state. This will wipe out your saved debugger window positions and other saved debugger settings, so use it as a last resort.

  3. WinDbg takes a very long time to process module load events, but it is not consuming a lot of processor time. This typically means that your symbol path includes either a broken HTTP symbol store link or a broken UNC symbol store path. A non-responsive path in your symbol path will cause any operation that tries to load symbols for a module to take a long time to complete as a network timeout will be occuring over and over again.

    Use !sym noisy, followed by .reload /f to determine what part of your symbol path is not working correctly. Then, fix or remove the offending part of the symbol path.

    This problem can also occur when you are debugging a program that is in the packet path for packets destined to a location on the symbol path. In this case, the typical workaround I recommend is to set an empty symbol path, attach to the process in question, write a dump file, and then detach from the process. Then, restore the normal symbol path and open the dump file in the debugger, and issue a .reload /f command to force all symbols to be pre-cached ahead of time. After all symbols are pre-cached in the downstream store cache, change the symbol path to only reference the downstream store cache location and not any UNC or HTTP symbol server paths, and attach the debugger to the process in the packet path for symbol server access.

  4. WinDbg refuses to load symbols for a module that I know the symbol server has symbols for. This issue can occur if WinDbg has previously tried (and failed) to download symbols for a module. There appears to be a bug in dbghelp’s symbol server support which can sometimes result in partially downloaded PDB files being left in the downstream store cache. If this happens, future attempts to access symbols for the module will fail with an error saying that symbols for the module cannot be found.

    If you turn on noisy symbol loading (!sym noisy), a more descriptive error is typically given. If you see a complaint about E_PDB_CORRUPT, then you are probably falling victim to this issue. The debugger output that indicates this problem would look like something along the lines of this:

    DBGHELP: c:\symbols­\ntdll.pdb­\2744327E50A64B24A87BDDCFC7D435A02­\ntdll.pdb – E_PDB_CORRUPT

    If you encounter this problem, simply delete the .pdb named in the error message and retry loading symbols via the .reload /f <modulename> command.

  5. WinDbg hangs and never comes back when I attach to a specific process, such as an svchost instance. If you’re sure that you aren’t experiencing a problem with a broken symbol path or unqualified module load tracking breakpoints being saved in your workspace, and the debugger never comes back when attaching to a certain process (or almost always hangs after the first command when attaching to the process in question), then the process you are debugging may be in a code path responsible for symbol loading.

    This problem is especially common if you are debugging an svchost instance, as there are a lot of important but unrelated pieces of code running in the various svchost instances, some of which are critical for network symbol server support to work. If you are debugging a process in the critical path for network symbol server support, and you have a symbol path with a network component set, then you may cause the debugger to deadlock (hang forever) the first time you try and load symbols.

    One example of a situation that can cause this is if you are debugging code in the same svchost instance as the DNS cache service. In this case, when you try to load symbols and you have an HTTP symbol server link in your symbol path, the debugger will deadlock because it will try and make an RPC call to the DNS cache service when it tries to resolve the hostname of the server referenced in your symbol path. Because the DNS cache service will never respond until the debugger resumes the process, and the debugger will never resume the process until it gets a response from the RPC request to the DNS cache service, your debugging session will hang indefinitely.

    Note that if you are simply debugging something in the packet path of a symbol server store, you will typically see the debugger become unresponsive for long periods of time but not hang completely. This is because the debugger can handle network timeouts (if somewhat slowly) and will eventually fail the request to the network symbol path. However, if the debugger tries to make an IPC request of some sort to the process being debugged, and the IPC request doesn’t have any built-in timeout (most local IPC mechanisms do not), then the debugger session will be lost for good.

    This problem can be worked around similarly to how I typically recommend users deal with slow module loading or failed symbol server accesses with a program in the packet path for a symbol server referenced in the symbol path. Specifically, it is possible to pre-cache all symbols for the process by creating a dump of the process from a debugger instance with an empty symbol path, and then detaching and opening the dump with the full symbol path and forcing a download of all symbols. Then, start a debugging session on the live process with a symbol path that references only the local downstream store into which symbols were being downloaded to in order to prevent any dangerous network accesses from happening.

    Another common way to get yourself into this sort of debugger deadlock problem is to use the clipboard to paste into WinDbg while you are debugging a program that has placed something into the clipboard. This results in a similar deadlock as WinDbg may get blocked on a DDE request to the clipboard owner, which will never respond by virtue of being debugged. In that case, the workaround is simply to be careful about copying or pasting text into or out of WinDbg.

  6. Remote debugging with -remote or .server is flaky or stops working properly after awhile. This can happen if all debuggers in the session aren’t running the same debugger version.

    Make sure that all peers in the remote debugging scenario are using the (same) latest debugger version. If you mix and match debugger versions with -remote, things will often break in strange and hard to diagnose ways in my experience (there doesn’t seem to be a whole lot of graceful support for backwards or forwards compatibility with respect to the debugger remoting protocol).

    Also, several recent releases of the debugger package didn’t work at all in remote debugging mode on Windows 2000. This is, as far as I know, fixed in the latest release.

Most of these problems are simple to fix or avoid once you know what to look for (although they can certainly burn a lot of time if you’re caught unaware, having done that myself while learning about these “gotchas”).

If you’re experiencing a weird WinDbg problem, you should also not be shy about debugging the malfunctioning debugger instance itself. Often, taking a stack trace of all threads in the problematic debugger instance will be enough to give you an idea of what sort of problem is holding things up (remember that the Microsoft public symbol server has symbols for the debugger binaries as well as the OS binaries).

14 Responses to “Common WinDbg problems and solutions”

  1. […] статья про типичные проблемы WinDbg на английском: Common WinDbg problems and solutions. Tags: Инструменты разработчика, Отладка posted by Not a kernel guy […]

  2. Pavel Lebedinsky says:

    #2 and other related problems (such as symbols for all modules being loaded when you make a typo and pass a non-existant symbol to a command like bp or ?) can be avoided by disabling unqualified loads, either in the GUI or using .symopt+ 100 (or -snul).

  3. Nate says:

    Nice summary. A few other things I wanted to mention (probably obvious to most.)

    When you’re stopped at a breakpoint in a remote kd session, the other system is not calling hlt and all CPUs are spinning. So you can build up heat quickly, i.e. with laptops. So don’t leave a machine in the debugger for hours.

    If you’re running windbg from the machine you’re using to build the binary or driver, your build may occasionally fail to write the PDB. That’s because windbg holds an open reference to the PDB and it thus can’t be overwritten by the linker. This usually happens when at a breakpoint. Just continue the execution before starting your build.

  4. Koby Kahane says:

    Nate, another thing you can do in the second case you mention is use the “.reload /u” command on the driver module to have the debugger unload the PDB for it. This may be more convenient than resuming execution in some cases.

  5. The duo of “!sym noisy and .reload /f” works very well. You have provided well versatile optimized solutions of handling the Wndbg. But the symbols problem are gain occuring in windows XP SP2.

    I would like to discuss that issue in detail. I will send you an email.

  6. Greg says:

    Excellent post. I’ve only recently plucked up the courage to tackle WinDbg, and while it’s nothing like as friendly as OllyDbg I’m really coming to like it. Your guide here has helped me through two tricky spots, though, so keep the the good stuff coming :).

  7. […] WinDbg to an svchost instance containing services in the symbol server lookup code path is risky business, as you can easily deadlock the debugger. Proceed with […]

  8. […] of the more annoying things that can happen while debugging processes that deal with network traffic is happening to attach to […]

  9. Thomson says:

    What is “Unqualified breakpoints” in second item?

  10. Skywing says:

    Thomson: Breakpoints that don’t specify a module name. For example, breakpoints missing a “module!” on their symbol name.

  11. Yuhong Bao says:

    “Another common way to get yourself into this sort of debugger deadlock problem is to use the clipboard to paste into WinDbg while you are debugging a program that has placed something into the clipboard. This results in a similar deadlock as WinDbg may get blocked on a DDE request to the clipboard owner, which will never respond by virtue of being debugged. In that case, the workaround is simply to be careful about copying or pasting text into or out of WinDbg.”
    What I usually do in this case is to continue the process, paste it into the WinDbg Scratch Pad, copy it from there, and then paste it into whatever area you want to paste it into.

  12. Kishore says:

    I was facing problem #2 and your solution helped me… Thanks a lot…

  13. Tian Hu says:

    Thanks you post . It’s very useful .
    I clear all the breakpoints(6) in my C++, the attach process become very very fast .
    before I clear the breakpoint , the attach process will spend over 2 hours.

  14. Deepak Dhamdhere says:

    I recently uninstalled SoftIce. Then I setup target machine for windbg using serial port. But when I boot up the target, it does not establish communication with windbg running on the target. Ctrl+Break don’t work. I have verified that serial communication link is ok.
    I suspect that SoftIce has left behind some driver or registry setting that prevents it from catching Ctrl+Break from host.
    Any idea how can I get windbg to work on this machine?