Archive for October, 2006

Debugging programs that block symbol server access

Tuesday, October 31st, 2006

One of the rather frustrating things to debug in Windows is one of the programs or services that is in the path of symbol server requests. This is most often the case with a service running in a svchost group with a number of other services that are responsible for things like DNS or some internal support that WinInet relies on or soforth. This deadlock condition is also prone to happening if you are debugging something else that makes lots of calls out to WinInet, as WinInet has some cross-process state that allows the symbol server engine to get deadlocked waiting on the target to release a global mutex (or similar global synchronization / state).

If you naively try to simply attach a debugger and load symbols in such a situation, you’ll end up with a nasty surprise; the debugger will hang, and you’ll have to kill it (and whatever you were debugging) to recover. In the case of svchost groups, many of those services will not properly restart after just being abruptly killed, so you may even have to reboot.

Now, one obvious solution to this problem is to just turn off symbol server access at all and work without symbols. This is obviously a major pain, though – nobody wants to debug without symbols if you actually have access to them, right?

There are a couple of other things that you can do to debug things in this scenario, however, that are a bit less painful than forgoing symbols:

First, you can use the kernel debugger to debug user mode processes. I have often seen Microsoft employees recommend this on the newsgroups. While this works (the kernel debugger is not affected by the state of any services that you are poking on the target computer), it is most certainly a royal pain to do. Using the kernel debugger means that breakpoints will often affect all processes (unless used via ba), you’ll have to deal with parts of the program being paged out, and not to mention the fact that kernel debugger connections are typically much less responsive than a local user mode debugger.

Because of the inordinate amount of pain involved in using kd to debug a user mode process, I do not recommend ever going this route unless you absolutely positively have no other recourse for accurate debugging.

Therefore, I recommend a different procedure in this case:

  1. Disable symbol server entirely (remove all SRV* symbol server references from your symbol path) or make sure that you have loaded symbols for ntdll.dll.
  2. Attach to the process in question normally, but do not load symbols or issue any commands.
  3. Write a full user minidump out to disk somewhere. For example, .dump /ma c:\tmp.dmp.
  4. Detach from the process that blocks symbol server from working.
  5. Open the minidump you saved earlier, and issue a .reload /f command. This causes all symbols in the process to be downloaded from the symbol server if you do not already have them.
  6. Re-attach to the process that you wanted to debug, and set your symbol path to refer to the downstream store you used with symbol server, but without invoking symbol server. That is, if you had previously used SRV*C:\symbols* for your symbol path, set it to C:\symbols. This ensures that you will never try to hit the symbol server.
  7. Debug as normal.

After that, you’ll have everything in the process that could have symbols on the symbol server already downloaded into your downstream store. By turning off symbol server and just using the downstream store when you actually start your real debugging efforts, you’ll make sure that the debugger will never deadlock itself against the target. Best of all, you don’t need to download a very large symbol pack from Microsoft that might be missing files that have been hotfixed, since you are still (preloading) all of the symbols from the symbol server. This trick can, of course, also be used with your own internal symbol servers as well.

Win32 calling conventions: __fastcall in assembler

Monday, October 30th, 2006

The __fastcall calling convention is the last major major C-supported Win32 (x86) calling convention that I have not covered yet. (There still exists __thiscall, which I’ll discuss later).

__fastcall is, as you might guess from the name, a calling convention that is designed for speed. In this spirit, it attempts to borrow from many RISC calling conventions in that it tries to be register-based instead of stack based. Unfortunately, for all but the smallest or simplest functions, __fastcall typically does not end up being a particularly stellar thing performance-wise, for x86, primarily due to the (comparatively) extremely limited register set that x86 sports.

This calling convention has a great deal in common with the x64 calling convention that Win64 uses. In fact, aside from the x64-specific parts of the x64 calling convention, you can think of the x64 calling convention as a logical extension of __fastcall that is designed to take advantage of the expanded register set available with x64 processors.

What this boils down to is that __fastcall will try to pass the first two pointer-sized arguments in the ecx and edx registers. Any additional registers are passed on the stack as per __stdcall.

In practice, the key things to look out for with a __fastcall function are thus:

  • The callee assumes a meaningful value in the ecx (or edx and ecx) registers. This is a tell-tale sign of __fastcall (although, you may sometimes see __thiscall make use of ecx for the this pointer).
  • No arguments are cleaned off the stack by the caller. Only __cdecl functions have this property.
  • The callee ends in a retn (args-2)*4 instruction. In general, this is the pattern that you will see with __fastcall functions that use the stack. For __fastcall functions where no stack parameters are used, the function typically ends in a ret instruction with no stack displacement argument.
  • The callee is a short function with very few arguments. These are the most likely cases where a smart programmer will use __fastcall, as otherwise, __fastcall does not tend to buy you very much over __stdcall.
  • Functions that interface directly with assembler. Having access to ecx and edx can be a handy shortcut for a C function that is being called by something that is obviously written in assembler.

Taking these into account, let’s take a look at the same sample function and function call that we have been previously dealing with in our earlier examples, this time in __fastcall.

The function that we are going to call is declared as so:

int __fastcall FastcallFunction1(int a, int b, int c)
	return (a + b) * c;

This is consistent with our previous examples, save that it is declared __fastcall.

The function call that we shall make is as so:

FastcallFunction1(1, 2, 3);

With this code, we can expect the function call to look something like so in assembler:

push    3                 ; push 'c' onto the stack
push    2                 ; place a constant 2 on the stack
xor     ecx, ecx          ; move 0 into 'a' (ecx)
pop     edx               ; pop 2 off the stack and into edx.
inc     ecx               ; set 'a' -- ecx to 1 (0+1)
call    FastcallFunction1 ; make the call (a=1, b=2, c=3)

This is actually a bit different than we might expect. Here, the compiler has been a bit clever and used some basic optimizations with setting up constants in registers. These optimizations are extremely common and something that you should get used to seeing as simply constant assignments to registers, given how frequently they show up. In a future series, I’ll go into some more details as to common compiler optimizations like these, but that’s a tale for a different time.

Continuing with __fastcall, here’s what the implementation of FastcallFunction1 looks like in assembler:

FastcallFunction1 proc near

c= dword ptr  4

lea     eax, [ecx+edx] ; eax = a + b
imul    eax, [esp+4]   ; eax = (eax * c)
retn    4              ; return eax;
FastcallFunction1 endp

As you can see, in this particular instance, __fastcall turns out to be a big saver as far as instructions executed (and thus size, and in a lesser degree, speed) of the callee. This kind of benefit is usually restricted to extremely simple functions, however.

The main things, then, to consider if you are trying to identify if a function is __fastcall or not are thus:

  • Usage of the ecx (or ecx and edx) registers in the function without loading them with explicit values before-hand. This typically indicates that they are being used as argument registers, as with __fastcall.
  • The caller does not clean any arguments off the stack (no add esp instruction to clean the stack after the call). With __fastcall, the callee always cleans the arguments (if any).
  • A ret instruction (with no stack displacement argument) terminating the function, if there are two or less arguments that are pointer-sized or smaller. In this case, __fastcall has no stack arguments.
  • A retn (args-2)*4 instruction terminating the function, if there are three or more arguments to the function. In this case, there are stack arguments that must be cleaned off the stack via the retn instruction.

That’s all for __fastcall. More on other calling conventions next time…

Debugger commands review

Sunday, October 29th, 2006

This posting is a master list of all the other posts and post series that cover different WinDbg commands, whether they be built-in commands, extension commands, or even third-party extension commands.

  1. Using SDbgExt to aid your debugging and reverse engineering efforts (part 1). SDbgExt is the debugger extension that I maintain and make publicly available. This series provides a high-level overview of the different commands that it offers.
  2. SDbgExt extensions, part 2
  3. Useful WinDbg commands: .formats
  4. Using knf to track down excessive stack usage. This trick is discussed in a section of the “Beware of stack usage with the new network stack in Windows Vista” post.
  5. Removing kernel patching on the fly with the kernel debugger. This article discusses how you can use the !chkimg command to remove patches and hooks on loaded module code at runtime. (This particular command is also available and applicable to the user mode debuggers, and not just the kernel debugger.)
  6. Debugger flow control: More on breakpoints (part 2). This article explores some of the inner workings of the various breakpoints supported by WinDbg. In addition, it describes the .apply_dbp command that can be used to apply a set of hardware breakpoints to the current register context, or a saved register context image in-memory.
  7. SDbgExt 1.09 released (support for displaying x64 EH data). This article describes the !fnseh command in SDbgExt that can be used to view exception handlers and unwind handlers for x64 targets from the debugger.
  8. Useful debugger commands: .writemem and .readmem. This article covers the .writemem and .readmem commands that can be used to move large sections of raw data into or out of the debugger.

Upcoming topics…

Saturday, October 28th, 2006

Some of the things I’m planning on covering some time soon are:

  • Finish off the Win32 calling conventions series.
  • Continue the series about how the object namespace intersects with the Win32 API a bit further (Terminal Server session isolation, logon session isolation for mapped drives, etc).
  • Describe some of the common optimizer tricks that you might see when reverse engineering something (x86 assembler). For example, multiplication tricks with lea and that sort of thing.
  • Overview of how to use kernrate (the Microsoft profiler utility) to track down processor usage / performance problems.
  • Care and feeding of inline assembler on x86 (when to use it and when not to use it).
  • SEH on x86 (maybe)
  • A basic discussion on kernel debugging at some point in time…

I’m also working on organizing some of the post series into an easily accessible directory for quicker reference (note that some of the links in the Post Directory may not yet be active).

Let me know if there are any other topics that you might be interested in hearing about and I’ll see about posting about them at some point.

Things to watch out for if you hook functions on Windows Vista

Friday, October 27th, 2006

There are a couple of things that I have ran into that you should keep in mind if you are hooking functions and are planning to run under Windows Vista.

First, watch out for things being moved around in memory. For example, in Windows Vista, the VirtualProtect function in kernel32 and the CreateProcessA function in kernel32 are now on the same page, for the x86 build [NOTE: this is subject to rapid change with hotfixes, and may not still be the case on RTM]. If you have some code that works conceptually like so:

DWORD  OldProt;
PVOID  MyCreateProcessA;
PUCHAR _CreateProcessA;
static ULONG MyHook;

MyHook = (ULONG)&MyCreateProcessA;

VirtualProtect(_CreateProcessA, 6,

// [...] Disassembly and stub saving
//       code goes here...

// jmp dword ptr [MyHook]

_CreateProcessA[0] = 0xFF:
_CreateProcessA[1] = 0x25;
*(PULONG)(&_CreateProcessA[2]) = &MyHook;

VirtualProtect(_CreateProcessA, 6,
	OldProt, &OldProt);

… you’ll run into some strange crashes in Vista, because you might end up making the pages backing VirtualProtect’s implementation non-executable by accident. (Remember that memory protections only have page granularity.)

The solution? Use PAGE_EXECUTE_READWRITE for your “intermediate” states when hooking things.

Secondly, watch out for AcLayers.dll and ShimEng.dll. These two DLLs are the core of Microsoft’s Application Compatibility Layer, which is the engine used to apply compatibility fixes at runtime to broken programs that would otherwise fail to work on Windows Vista. (This engine is also used if you select a particular compatibility layer in the property sheet for a shortcut to an executable or an executable.)

The thing to watch out for here is that AcLayers likes to do import table hooking on various kernel32 APIs. In particular, AcLayers tends to hook GetProcAddress and then occasionally redirect returned function pointers to point into AcLayers.dll and not kernel32.dll. If you have a program that assumes that any pointer that it retrieves from kernel32.dll via GetProcAddress will remain at the same address for any other process in the same session, this can result in some unpleasant surprises.

For instance, consider the classic case of wanting to inject some code to run before the main process entrypoint of a child process. You might do something like inject some code that calls kernel32!LoadLibraryA on some DLL your application surprise, and then kernel32!GetProcAddress to get the address of a function in that DLL. Then the patch code might invoke a function in your DLL and return to the initial program entrypoint of the child process. This is actually a fairly common paradigm if you need to modify some sort of behavior of a child process. Unfortunately, it can easily break if the parent process is under the influence of the dreaded application compatibility layer.

The main problem here is that when you, say, find the address of LoadLibraryA or GetProcAddress in kernel32, AcLayers.dll steps in and actually hands you the address of a stub function inside AcLayers.dll which filters requests to load DLLs or get function pointers. This is all well and fine with the parent process; AcLayers.dll is there and can do whatever it’s work is whenever you call GetProcAddress or LoadLibraryA.

The catch is what happens when you try to make a child process call LoadLibraryA on a DLL before it runs the main program entrypoint. In this case, instead of passing a pointer into kernel32 (which is guaranteed to be present and at the same base address in every Win32 process in the same session), you are passing a pointer into AcLayers.dll to the child process. The problem case is when AcLayers.dll is not loaded immediately into the child process. Here, your patch code in the newly created child process might try to call LoadLibraryA to get your custom DLL unloaded. However, it actually tries to call an internal AcLayers.dll function – but AcLayers.dll isn’t actually loaded into the address space of the child process (or might have even been rebased), so your child process mysteriously crashes instantly. This typically manifests itself as nothing happening when you try to launch a child program, depending on computer configuration.

There is unfortunately no particularly elegant way to work around this particular problem that I have found. The best advice I have to offer here is to try and bypass any possibility that any function pointer you pass to another process (in kernel32.dll) is never intercepted by AcLayers.dll. Perhaps the most fool-proof way to do this is to manually walk the export table of kernel32.dll and locate the address of the export that you are interested in, although this is not a particularly easy task.

The kernel object namespace and Win32, part 1

Thursday, October 26th, 2006

The kernel object namespace is partially exposed by various Win32 APIs. Everything that allows you to create a named object that returns a kernel handle is interacting with the kernel object namespace in some form or another, and many Win32 APIs internally use the object namespace under the hood.

The kernel object namespace is fairly similar to a filesystem; there are object directories, which contain named objects. Objects can be of various different types, such as a Device object (created by a kernel driver) or an Event object, a Semaphore object, and soforth. Additionally, there are symbolic link objects, which (like filesystem links on a UNIX-based system) allow you to create one name that simply refers to another named object in the system.

Until the introduction of Windows 2000, the part of the kernel object namespace that Win32 exposed was a fairly limited and simple subset of the full object namespace available to drivers and programs using the native system call interfaces.

First, file-related APIs interact with the \DosDevices object directory (otherwise known as \??). This is the object directory that holds anything that you might open with CreateFile() and related calls, such as drive letter links (say, C:), serial ports (COM1), other standard DOS devices, and custom devices created by kernel drivers. This is why, if you are a driver, you need to explicitly specify \DosDevices\DeviceName instead of that being automatically assumed (as it is in Win32, if you call CreateFile). Otherwise, the created object name will not be easily accessible to Win32.

Secondly, there is the \BaseNamedObjects object directory. This object directory is where named Event, Mutex, Semaphore, and Section (file mapping) objects are based at when created with the Win32 API.

\BaseNamedObjects is managed and created by the Base API server dll (basesrv.dll) running in the context of CSRSS at boot time. This means that, in particular, boot start drivers cannot rely on \BaseNamedObjects as being present early in the boot process (which can be a problem if you want to share a named event object with a user mode program, from a boot start driver). \DosDevices, however, is created by the kernel itself at boot time and is generally always accessible.

In general, that is the limit to how much of the kernel namespace is directly exposed to (and used to support) Win32 prior to Windows 2000. (This is technically not quite true. There is a little used pair of kernel32 APIs called DefineDosDevice and QueryDosDevices that allow limited manipulation of symbolic links based within the \DosDevices object directory. Using these APIs, you can discover the native target names of many of the internal symbolic links (for example, C: -> \Device\HarddiskVolume2). You can also create symbolic links based in \DosDevices that point to other parts of the NT object namespace with the DDD_RAW_TARGET_PATH flag using DefineDosDevice.).

Next time I’ll go into a bit more detail as to how some of the changes to the object manager namespace work with Windows 2000, and then Windows XP, which both introduce some significant changes to how Win32 interacts with object names (first with improved multi-session support for Terminal Server and Fast User Switching, and then with how mapped drive letters work with LSA logon sessions).

Beware of stack usage with the new network stack in Windows Vista

Tuesday, October 24th, 2006

In Windows Vista, much of the network stack that ships with the OS uses much more stack than in previous versions of the operating system.

From my experience, just indicating a UDP datagram up to NDIS can require you to have over 4K of kernel stack available on x86, or you risk taking a double fault and causing the system to bugcheck.

For example, here’s a portion of the stack that I ran into while debugging an unrelated problem at the Vista compatibility lab:

0: kd> k100
ChildEBP RetAddr  
818e6bdc 818ad19b RtlpBreakWithStatusInstruction
818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c
818e6fdc 8184845e KeBugCheck2+0x5f4
818e6fdc 81871d35 KiTrap08+0x75
9c9cb084 8186dd14 SepAccessCheck+0x1e0
9c9cb0e0 81887907 SeAccessCheck+0x1a4
9c9cb51c 8715474c SeAccessCheckFromState+0xe4
9c9cb55c 871546d6 CompareSecurityContexts+0x47
9c9cb57c 87153b1a MatchValues+0xd4
9c9cb59c 87153aa7 CheckEqualConditionEnumMatch+0x3f
9c9cb63c 87153a1b MatchConditionOverlap+0x72
9c9cb660 87153774 FilterMatchEnum+0x6c
9c9cb674 8715948b FilterMatchEnumVisible+0x28
9c9cb6ac 87159520 IndexHashFastEnum+0x4d
9c9cb6f8 87158624 IndexHashEnum+0x139
9c9cb724 87159362 FeEnumLayer+0x7a
9c9cb7ac 87159b16 KfdGetLayerActionFromEnumTemplate+0x50
9c9cb7cc 8d6af9e4 KfdCheckAndCacheAcceptBypass+0x27
9c9cb8c4 8d6afc87 CheckAcceptBypass+0x146
9c9cb9a0 8d6b185d WfpAleAuthorizeReceive+0x82
9c9cba08 8d6ad542 WfpAleConnectAcceptIndicate+0x98
9c9cba74 8d6ad432 ProcessALEForTransportPacket+0xc5
9c9cbaf0 8d6ae6b3 ProcessAleForNonTcpIn+0x6f
9c9cbd28 8d6b0df0 WfpProcessInTransportStackIndication+0x2ab
9c9cbd78 8d6b0ae0 InetInspectReceiveDatagram+0x9a
9c9cbdfc 8d6b091c UdpBeginMessageIndication+0x33
9c9cbe44 8d6aecf3 UdpDeliverDatagrams+0xce
9c9cbe90 8d6aec40 UdpReceiveDatagrams+0xab
9c9cbea0 8d6acdd4 UdpNlClientReceiveDatagrams+0x12
9c9cbecc 8d6acba4 IppDeliverListToProtocol+0x49
9c9cbeec 8d6acad3 IppProcessDeliverList+0x2a
9c9cbf40 8d6ab443 IppReceiveHeaderBatch+0x1da
9c9cbfd0 8d6ac61d IpFlcReceivePackets+0xc06
9c9cc04c 8d6abf36 FlpReceiveNonPreValidatedNetBufferListChain
9c9cc074 8727b0b0 FlReceiveNetBufferListChain+0x104
9c9cc0a8 8726d737 ndisMIndicateNetBufferListsToOpen+0xab
9c9cc0d0 8726d6ae ndisIndicateSortedNetBufferLists+0x4a
9c9cc24c 871b53c3 ndisMDispatchReceiveNetBufferLists+0x129
9c9cc268 872802c4 ndisMTopReceiveNetBufferLists+0x2c
9c9cc2b4 b0a3fb4d ndisMIndicatePacketsToNetBufferLists+0xe9

From ndisMIndicatePacketsToNetBufferLists to where the system double faulted (in my case) inside of SeAccessCheck, a whopping
4656 bytes
of kernel stack were consumed.

So, now is the time to slim down your stack usage in your NDIS-related drivers, or you might be in for some unpleasant surprises when your drivers are used in conjunction with multiple third party IM drivers or the like (even better, you might investigate switching away from IM drivers and to the new filtering architecture). You should also be especially wary of any code that loops a packet that might potentially go back into tcpip.sys in a receive calling context (or any other context where you might have limited stack space available), as this can prove an unexpectedly expensive operation on Vista (and potentially beyond).

Oh, and a tip for finding stack hog functions with stack overflow problems: Use the ‘f’ flag with the ‘k’ command in WinDbg. For example:

0: kd> knf
 #   Memory  ChildEBP RetAddr  
00           818e6bdc 818ad19b RtlpBreakWithStatusInstruction
01        50 818e6c2c 818adc08 KiBugCheckDebugBreak+0x1c
02       3b0 818e6fdc 8184845e KeBugCheck2+0x5f4
03         0 818e6fdc 81871d35 KiTrap08+0x75

This has the debugger compute the stack (arguments + locals) usage at each call frame point for you, saving you a bit of work with the calculator.

Debugging (or reverse engineering…) a real life Windows Vista compatibility problem: CreateIpForwardEntry in iphlpapi

Tuesday, October 24th, 2006

Since I’m at the Microsoft Vista compatibity lab, it only makes sense that I’ve fixed a few Vista compatibility bugs in our product today.

Some of these are real bugs, but I ran into one in particular that is particularly infuriating: a completely undocumented, seemingly completely arbitrary restriction placed on a publicly documented API that has been around since Windows 98.

In this particular case, I was running into a problem where one of our products was being unable to add routes on Vista. This worked fine on prior platforms we supported, and so I started looking into it as a compatibility problem. First things first, I narrowed the problem down to a particular API that was failing.

We have a function that wrappers the various details about creating routes. The function in question went approximately like so:

// Add a route through the desired gateway.

	__in unsigned long Network,
	__in unsigned long Mask,
	__in unsigned long Gateway
	DWORD            Status, ForwardType;
	unsigned long    InterfaceIp, InterfaceIndex;

[...]	// (Code to determine the local
	// interface to add the route on)

	// Setup the IP forward row.


	Row.dwForwardDest    = Network;
	Row.dwForwardMask    = Mask;
	Row.dwForwardPolicy  = 0;
	Row.dwForwardNextHop = Gateway;
	Row.dwForwardIfIndex = InterfaceIndex;
	Row.dwForwardType    = ForwardType;
	Row.dwForwardProto   = PROTO_IP_NETMGMT;
	Row.dwForwardAge     = INFINITE;
	Row.dwForwardMetric1 = 0;

	// Create the route.

	if ((Status = CreateIpForwardEntry(&Row))
		!= NO_ERROR)
		wprintf(L"Creation failed, %lu.\\n",
		return Status;

[...]	// (More unrelated boilerplate code)

	return Status;

Essentially, the problem here was that CreateIpForwardEntry was failing. Checking logs, the error code logged was 0xA0.

Using the handy Microsoft error code lookup utility (err.exe), it was easy to determine what this error code means:

C:\\>err a0
# for hex 0xa0 / decimal 160 :
  INTERNAL_POWER_ERROR                            bugcodes.h
  LLC_STATUS_BIND_ERROR                           dlcapi.h
  SQL_160_severity_15                             sql_err
# Rule does not contain a variable.
  ERROR_BAD_ARGUMENTS                             winerror.h
# One or more arguments are not correct.
# Too much incoming data%0
# 5 matches found for "a0"

The only error that makes sense in this context is ERROR_BAD_ARGUMENTS. Unfortunately, that is not really all that helpful. Checking the latest MSDN documentation for CreateIpForwardEntry, there is, of course, no mention of this error code whatsoever.

Additionally, looking at the Microsoft documentation, nothing immediately jumped to mind as to what the problem is.

Although the Microsoft people here for the Vista lab did offer to see about getting me in touch with someone in the product team who might have an explanation for this behavior, I eventually decided that I would just take a crack at digging into the internals of CreateIpForwardEntry and understand the problem myself in the meanwhile to see if I might be able to come up with a fix sooner. After searching around a bit on Google and not coming up with any good explanation for what was going wrong, I eventually decided to step into iphlpapi!CreateIpForwardEntry in the debugger and see just what was going wrong first-hand.

0:000> bu iphlpapi!CreateIpForwardEntry
breakpoint 0 redefined
0:000> g
Breakpoint 0 hit
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdfc1 esp=0012fd58 ebp=0012fdb0 iopl=0
nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
751bdfc1 8bff            mov     edi,edi

Looking at the disassembly of CreateIpForwardEntry, it’s clear that this function is now just a stub that forwards the call onto another function that performs the real work:

0:000> u @eip
751bdfc1 8bff       mov     edi,edi
751bdfc3 55         push    ebp
751bdfc4 8bec       mov     ebp,esp
751bdfc6 6a01       push    1
751bdfc8 ff7508     push    dword ptr [ebp+8]
751bdfcb e820ffffff call    CreateOrSetIpForwardEntry
751bdfd0 5d         pop     ebp
751bdfd1 c20400     ret     4

So, I pressed onward, stepping into iphlpapi!CreateOrSetIpForwardEntry

0:000> tc
751bdfcb e820ffffff call    CreateOrSetIpForwardEntry
0:000> t
eax=0012fd6c ebx=00000004 ecx=00000000 edx=00000000
esi=01040a0a edi=00000003
eip=751bdef0 esp=0012fd48 ebp=0012fd54 iopl=0
nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
751bdef0 8bff            mov     edi,edi

Looking at the disassembly, there appears to be only one place where the error code ERROR_BAD_ARGUMENTS (disassembly truncated for better viewing):

0:000> uf @eip
751bdef0 8bff            mov     edi,edi
751bdef2 55              push    ebp
751bdef3 8bec            mov     ebp,esp
751bdef5 83ec48          sub     esp,48h
751bdef8 8365b800        and     dword ptr [ebp-48h],0
751bdefc 56              push    esi
751bdefd 6a2c            push    2Ch
751bdeff 8d45bc          lea     eax,[ebp-44h]
751bdf02 6a00            push    0
751bdf04 50              push    eax
751bdf05 e8f053ffff      call    memset
751bdf0a 8b7508          mov     esi,dword ptr [ebp+8]


; Convert the interface metric we passed in with
; the pRoute structure into an interface LUID,
; stored at [ebp-30].

751bdf36 8d45d0          lea     eax,[ebp-30h]
751bdf39 50              push    eax
751bdf3a ff7610          push    dword ptr [esi+10h]
751bdf3d e86590ffff      call    ConvertInterfaceIndexToLuid
751bdf42 85c0            test    eax,eax
751bdf44 7571            jne     751bdfb7

; Get the interface metric for the requested interface,
; and store it at [ebp+8].  We pass in the address of
; the LUID of the requested interface in order to make
; the check.

751bdf46 8d4508          lea     eax,[ebp+8]
751bdf49 50              push    eax
751bdf4a 8d45d0          lea     eax,[ebp-30h]
751bdf4d 50              push    eax
751bdf4e e802f4ffff      call    GetInterfaceMetric


; Load esi with pRoute->dwForwardMetric1

751bdf6c 8b7624          mov     esi,dword ptr [esi+24h]
751bdf6f 6a06            push    6
751bdf71 8945e0          mov     dword ptr [ebp-20h],eax
751bdf74 83c8ff          or      eax,0FFFFFFFFh
751bdf77 3b7508          cmp     esi,dword ptr [ebp+8]
751bdf7a 59              pop     ecx
751bdf7b 8d7de8          lea     edi,[ebp-18h]
751bdf7e f3ab            rep stos dword ptr es:[edi]
751bdf80 8945ec          mov     dword ptr [ebp-14h],eax
751bdf83 8945f0          mov     dword ptr [ebp-10h],eax
751bdf86 5f              pop     edi

; Check that esi is not less than [ebp+8]
; ... in other words, verify that
; pRoute->dwForwardMetric1 >= InterfaceMetric,
; where InterfaceMetric is set by GetInterfaceMetric()

751bdf87 7229            jb      751bdfb2 ; failure

751bdf89 2b7508          sub     esi,dword ptr [ebp+8]
751bdf8c 6a18            push    18h
751bdf8e 8d45e8          lea     eax,[ebp-18h]
751bdf91 50              push    eax
751bdf92 6a30            push    30h
751bdf94 8d45b8          lea     eax,[ebp-48h]
751bdf97 50              push    eax
751bdf98 6a10            push    10h
751bdf9a 6864331b75      push    751b3364
751bdf9f ff750c          push    dword ptr [ebp+0Ch]
751bdfa2 8975f4          mov     dword ptr [ebp-0Ch],esi
751bdfa5 6a01            push    1
751bdfa7 c645ff01        mov     byte ptr [ebp-1],1

; Call the NsiSetAllParameters internal API to create the
; route, and return its return value to the caller.

751bdfab e86857ffff      call    NsiSetAllParameters
751bdfb0 eb05            jmp     751bdfb7

751bdfb2 b8a0000000      mov     eax,0A0h

751bdfb7 5e              pop     esi
751bdfb8 c9              leave
751bdfb9 c20800          ret     8

From this annotated disassembly, we can conclude that there are only two possibilities that might result in this behavior. The first is that GetInterfaceMetric(InterfaceIndex, &InterfaceMetric) is returning an InterfaceMetric greater than the metric we are supplying. The second is that NsiSetAllParameters is returning ERROR_BAD_ARGUMENTS.

To test this theory, we need to examine the comparison at 751bdf87 to determine if that is taking the failure branch, and we need to check the return value of NsiSetAllParameters. This is fairly easy to do with a couple of breakpoints:

0:000> bu 751bdf87 
0:000> bu 751bdfb0 
0:000> g
Breakpoint 1 hit
eax=ffffffff ebx=00000004 ecx=00000000 edx=7707e524
esi=00000000 edi=00000003
eip=751bdf87 esp=0012fcf8 ebp=0012fd44 iopl=0
nv up ei ng nz ac pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
751bdf87 7229            jb      751bdfb2 [br=1]

Our first breakpoint, the one on the comparison with the “Interface Metric” and the route metric we supplied in pRoute->dwForwardMetric1, was the one that hit first (as expected). Looking at the register context supplied by WinDbg, though, we can clearly see that the program is going to take the branch and head down the code path that returns ERROR_BAD_ARGUMENTS. Problem identified!

There still remains the issue of solving the problem, though. Looking at [ebp+8], it appears that the undocumented iphlpapi!GetInterfaceMetric returned 10:

0:000> ? dwo(@ebp+8)
Evaluate expression: 10 = 0000000a

This makes sense. We supplied a metric of 0, which is obviously less than 10. Unfortunately, now we need a good way to determine whether we should use a zero metric (for previous OS versions) or a different metric (for Vista), assuming we want our route to be the most precedent for a particular network/mask value.

Unfortunately, MSDN doesn’t turn up any hits on GetInterfaceMetric, and neither does Google. Well, that sucks – it looks like that for Vista, unless I want to hardcode 10, I’ll have to go off into undocumented land to use a publicly documented API. There seems to be something a bit ironic about that to me, but, nonetheless, the problem remains to be solved.

Update: There is a (minimally) documented solution that was very recently made available. See the bottom of the post for details.

So, all that we need to do is reverse engineer the parameters to this undocumented GetInterfaceMetric function and call it, right?

Well, no, not exactly – things actually get worse. It turns out that GetInterfaceMeteric is not even exported from iphlpapi.dll – it’s a purely internal function!

The only other option at this point, aside from hardcoding 10 as a minimum metric, is to reimplement all of the functionality of GetInterfaceMetric ourselves. Taking a look at GetInterfaceMetric, things look unfortunately rather complicated:

0:000> uf iphlpapi!GetInterfaceMetric
751bd355 8bff            mov     edi,edi
751bd357 55              push    ebp
751bd358 8bec            mov     ebp,esp
751bd35a 6a1c            push    1Ch
751bd35c 6a04            push    4
751bd35e ff750c          push    dword ptr [ebp+0Ch]
751bd361 6a00            push    0
751bd363 6a08            push    8
751bd365 ff7508          push    dword ptr [ebp+8]
751bd368 6a07            push    7
751bd36a 6864331b75      push    NPI_MS_IPV4_MODULEID
751bd36f 6a01            push    1
751bd371 e88f5fffff      call    NsiGetParameter
751bd376 5d              pop     ebp
751bd377 c20800          ret     8

NPI_MS_IPV4_MODULEID is a global variable of some sort in iphlpapi:

0:000> db iphlpapi!NPI_MS_IPV4_MODULEID l8
751b3364  18 00 00 00 01 00 00 00  ........

Using the x command with ascending order, we can make an educated guess as to the size of this global by enumerating all symbols in iphlpapi in address space order:

0:000> x /a iphlpapi!*
751b3364 iphlpapi!NPI_MS_IPV4_MODULEID = <no type information>
751b3381 iphlpapi!NsiAllocateAndGetTable = <no type information>

So, we know that NPI_MS_IPV4_MODULEID must be no more than 0x1d bytes long. Taking a look around NPI_MS_IPV4_MODULE_ID, we see that past 0x18 bytes in, there appears to be code (nop instructions), making it likely that the global is 0x18 bytes long.

0:000> db 751b3364 
751b3364  18 00 00 00 01 00 00 00-00 4a 00 eb 1a 9b d4 11
751b3374  91 23 00 50 04 77 59 bc-90 90 90 90 90 ff 25 94

(The repeated 90 90 90 90 bytes are a typical sign of code. 90 is the opcode for the nop instruction on x86, which the compiler typically uses for padding out function start offsets for alignment.)

Given this, we should be able to replicate the behavior of GetInterfaceMetrics, as the only function it calls, NsiGetParameter, is exported by nsi.dll (of course, it isn’t documented…). From the above disassembly, we can see that NsiGetParameter takes a ulong-sized argument (constant 0x1), a pointer argument (address of NPI_MS_IPV4_MODULEID), a ulong-sized argument (constant 0x7), a pointer that is the address of the interface LUID (argument 1 of GetInterfaceMetrics, which we saw earlier), a ulong-sized argument (constant 0x8), a ulong or pointer-sized argument (constant 0x0), a pointer-sized argument (address of a ULONG containing the “interface metric”), a ulong-sized argument (constant 0x4), and (finally!) a ulong-sized argument (constant 0x1c). I would surmise that the 0x8 and 0x4 constants are the sizes of the LUID and output buffer, though I haven’t bothered to confirm that at this point.

From our knowledge of __stdcall, we can identify NsiGetParameter as __stdcall quickly by looking at the disassembly of GetInterfaceMetrics and noticing the behavior after the function call (not removing arguments from the stack space, assuming the callee (NsiGetParameter) performs that task.

Given all of this, we can make our own function that implements GetInterfaceMetric. Now, just to be clear, I would not recommend actually using this, unless Microsoft fails to provide a documented mechanism to determine the minimum metric permitted for CreateIpForwardEntry (or removes the restriction) prior to Vista RTM. I am going to try and do whatever I can to see what ISV’s are supposed to do with this particular problem (and whether it can be fixed before RTM) before this week is up, but in the event that I don’t get anywhere, I’ll have a backup plan (as ugly and hackish as it may be) – better than not being able to manipulate the route table, period, on Vista.

Anyway, the basic idea is that we call ConvertInterfaceIndexToLuid on the InterfaceIndex that we already have from iphlpapi, to convert this into a NET_LUID structure (new to Vista). It does so happen that ConvertInterfaceIndexToLuid is a documented API, which makes that the easy part.

Then, we simply replicate the call that we saw in GetInterfaceMetric inside iphlpapi.dll. For brevity, I am not posting the entire source code for my implementation of GetInterfaceMetric inline; you can, however, download it. With this reverse engineered implementation, all that is left is to call it to get the minimum metric for the interface we are about to add a route on, and place that metric in the MIB_IPFORWARDROW that we pass to CreateIpForwardEntry.

I’ll post back when I hear from Microsoft as to the official word as to how one is to handle this situation; I fully expect that there will be a documented API (or the restriction will go away) before RTM, at this point, given that this is a rather bad compatibility bug that breaks a long-existing documented API in such a way that requires you to go into undocumented hackery to continue to use it (especially since there is no other good way that I know of to replicate the functionality of the API in question).

Update: You can use the GetIpInterfaceEntry routine (new to Vista, in iphlpapi) to find the minimum metric for an interface. Note that you will very likely need to search on MSDN to find information on this function, as it’s not been included in recent SDKs to my knowledge.

(Note: Some of the debugger output was slightly modified or truncated by me to keep the formatting sane.)

Useful WinDbg commands: .formats

Monday, October 23rd, 2006

One of the many things that you end up having to do while debugging a program is displaying data types. While you probably know many of the basic commands like db, da, du, and soforth, one perhaps little-used command is useful for displaying a four or eight byte quantity in a number of different data types: the “.formats” command. This command is useful for viewing various primative/built-in data types, where you cannot display as a structure via the “dt” command.

In particular, you can use .formats to translate a number of different data types into readable values, including floating point or various time formats (time_t if you provide a 32-bit value, or FILETIME if you give a 64-bit value). For instance:

0:001> .formats 41414141
Evaluate expression:
  Hex:     41414141
  Decimal: 1094795585
  Octal:   10120240501
  Binary:  01000001 01000001 01000001 01000001
  Chars:   AAAA
  Time:    Fri Sep 10 01:53:05 2004
  Float:   low 12.0784 high 0
  Double:  5.40901e-315

The command also supports 64-bit filetime quantities:

0:001> .formats 01010101`01010101
Evaluate expression:
  Hex:     01010101`01010101
  Decimal: 72340172838076673
  Octal:   0004010020040100200401
  Binary:  00000001 00000001 00000001 00000001
           00000001 00000001 00000001 00000001
  Chars:   ........
  Time:    Sun Mar 28 21:14:43.807 1830 (GMT-4)
  Float:   low 2.36943e-038 high 2.36943e-038
  Double:  7.7486e-304

.formats is primarily useful for saving you a bit of time poking around in a calculator to translate times, or convert perhaps an overwritten eip into text if you are examining a stack buffer string overflow. In conjunction with db and dt, you should be able to format most any data you’ll come across in a debugging session into a readable format (provided you have symbols, of course, in the case of complex user-defined data types).

I don’t think that is what they really meant to say…

Sunday, October 22nd, 2006

While trying to identify just what kind of device Steve’s Mac appears when plugged into my laptop over a 1394 cable, I ran into this charming result from Google:

Performance of 1394 devices may decrease after you install Windows ...

“Performance of 1394 devices may decrease after you install Windows …” []

I don’t think that is what they really meant to say. I suppose Google summaries can be bad at times…

This is but one of many strange or poorly designed things I’ve encountered in the past few days…