Archive for June, 2007

The No-Execute hall of shame…

Friday, June 15th, 2007

One of the things that I do when I set up Windows on a new (modern) box that I am going to use for more than just temporary testing is to enable no-execute (“NX”) support by default. Preferable, I use NX in “always on” mode, but sometimes I use “opt-out” mode if I have to. It is possible to bypass NX in opt-out (or opt-in) mode with a “ret2libc”-style attack, which diminishes the security gain in an unfortunately non-trivial way in many cases. For those unclear on what “alwayson”, “optin”, “optout”, and “alwaysoff” mean in terms of Windows NX support, they describe how NX is applied across processes. Alwayson and alwaysoff are fairly straightforward; they mean that all processes either have NX forced or forced off, unconditionally. Opt-in mode means that only programs that mark themselves as NX-aware have NX enabled (or those that the administrator has configured in Control Panel), and opt-out mode means that all programs have NX applied unless the administrator has explicitly excluded them in Control Panel.

(Incidentally, the reason why the Windows implementation of NX was vulnerable to a ret2libc style attack in the first place was for support for extremely poorly written copy protection schemes, of all things. Specifically, there is code baked into the loader in NTDLL to detect SecuROM and SafeDisc modules being loaded on-the-fly, for purposes of automagically turning off NX for these security-challenged copy protection mechanisms. As a result of this, exploit code can effectively do the same thing that the user mode loader does in order to disable NX, before returning to the actual exploit code (see the Uninformed article for more details on how this works “under the hood”). I really just love how end user security is compromised for DRM/copy protection mechanisms as a result of that. Such is the price of backwards compatibility with shoddy code, and the myriad of games using such poorly written copy protection systems…)

As a result of this consequence of “opt-out” mode, I would recommend forcing NX to being always on, if possible (as previously mentioned). To do this, you typically need to edit boot.ini to enable always on mode (/noexecute=alwayson). On systems that use BCD boot databases, e.g. Vista or later, you can use this command to achieve the same effect as editing boot.ini on downlevel systems:

bcdedit /set {current} nx alwayson

However, I’ve found that you can’t always do that, especially on client systems, depending on what applications you run. There are still a lot of things out there which break with NX enabled, unfortunately, and “alwayson” mode means that you can’t exempt applications from NX. Unfortunately, even relatively new programs sometimes fall into this category. Here’s a list of some of the things I’ve ran into that blow up with NX turned on by default; the “No-Execute Hall of Shame”, as I like to call it, since not supporting NX is definitely very lame from a security perspective – even more so, since it requires you to switch from “Alwayson” to “Optout” (so you can use the NX exclusion list in Control Panel) which is in an of itself a security risk even for unrelated programs that do have NX enabled by default). This is hardly an exclusive list, but just some things that I have had to work around in my own personal experience.

  1. Sun’s Java plug-in for Internet Explorer. (Note that to enable NX for IE if you are in “Optout” mode, you need to go grovel around in IE’s security options, as IE disables NX for itself by default if it can – it opts out, in other words). Even with the latest version of Java (“Java Platform 6 update 1”, which also seems to be be known as 1.6.0), it’s “bombs away” for IE as soon as you try to go to a Java-enabled webpage if you have Sun’s Java plugin installed.

    Personally, I think that is pretty inexcusable; it’s not like Sun is new to JIT code generation or anything (or that Java is anything like the “new kid on the block”), and it’s been the recommended practice for a very long time to ensure that code you are going to execute is marked executable. It’s even been enforced by hardware for quite some time now on x86. Furthermore, IE (and Java) are absolutely “high priority” targets for exploits on end users (arguably, IE exploits on unpatched systems are one of the most common delivery mechanisms for malware), and preventing IE from running with NX is clearly not a good thing at all from a security standpoint. Here’s what you’ll see if you try to use Java for IE with NX enabled (which takes a bit of work, as previously noted, in the default configuration on client systems):

    (106c.1074): Access violation - code c0000005 (first chance)
    First chance exceptions are reported before any
    exception handling.
    This exception may be expected and handled.
    eax=6d4a4a79 ebx=00070c5c ecx=0358ea98
    edx=76f10f34 esi=0ab65b0c edi=04da3100
    eip=04da3100 esp=0358ead4 ebp=0358eb1c
    iopl=0         nv up ei pl nz na pe nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b
    gs=0000             efl=00210206
    04da3100 c74424040c5bb60a mov dword ptr [esp+4],0AB65B0Ch
    0:005> k
    ChildEBP RetAddr  
    WARNING: Frame IP not in any known module. Following
    frames may be wrong.
    0358ead0 6d4a4abd 0x4da3100
    0358eb1c 76e31ae8 jpiexp+0x4abd
    0358eb94 76e31c03 USER32!UserCallWinProcCheckWow+0x14b
    0:005> !vprot @eip
    BaseAddress:       04da3000
    AllocationBase:    04c90000
    AllocationProtect: 00000004  PAGE_READWRITE
    RegionSize:        000ed000
    State:             00001000  MEM_COMMIT
    Protect:           00000004  PAGE_READWRITE
    Type:              00020000  MEM_PRIVATE
    0:005> .exr -1
    ExceptionAddress: 04da3100
       ExceptionCode: c0000005 (Access violation)
      ExceptionFlags: 00000000
    NumberParameters: 2
       Parameter[0]: 00000008
       Parameter[1]: 04da3100
    Attempt to execute non-executable address 04da3100
    0:005> lmvm jpiexp
    start    end        module name
        CompanyName:  JavaSoft / Sun Microsystems
        ProductName:  JavaSoft / Sun Microsystems
                      -- Java(TM) Plug-in
        InternalName: Java Plug-in für Internet Explorer

    (Yes, the internal name is in German. No, I didn’t install a German-localized build, either. Perhaps whoever makes the public Java for IE releases lives in Germany.)

  2. NVIDIA’s Vista 3D drivers. From looking around in a process doing Direct3D work in the debugger on an NVIDIA system, you can find sections of memory laying around that are PAGE_EXECUTE_READWRITE, or writable and executable, and appear to contain dynamically generated SSE code. Leaving memory around like this diminishes the value of NX, as it provides avenues of attack for exploits that need to store some code somewhere and then execute it. (Fortunately, technologies like ASLR may make it difficult for exploit code to land in such dynamically allocated executable and writable zones in one shot. An example of this sort of code is as follows (observed in a World of Warcraft process):
    0:000> !vprot 15a21a42 
    BaseAddress:       0000000015a20000
    AllocationBase:    0000000015a20000
    AllocationProtect: 00000040  PAGE_EXECUTE_READWRITE
    RegionSize:        0000000000002000
    State:             00001000  MEM_COMMIT
    Protect:           00000040  PAGE_EXECUTE_READWRITE
    Type:              00020000  MEM_PRIVATE
    0:000> u 15a21a42
    15a21a42 8d6c9500       lea     ebp,[rbp+rdx*4]
    15a21a46 0f284d00       movaps  xmm1,xmmword ptr [rbp]
    15a21a4a 660f72f108     pslld   xmm1,8
    15a21a4f 660f72d118     psrld   xmm1,18h
    15a21a54 0f5bc9         cvtdq2ps xmm1,xmm1
    15a21a57 0f590dd0400b05 mulps   xmm1,xmmword ptr
               [nvd3dum!NvDiagUmdCommand+0xd6f30 (050b40d0)]
    15a21a5e 0f59c1         mulps   xmm0,xmm1
    15a21a61 0f58d0         addps   xmm2,xmm0
    0:000> lmvm nvd3dum
    start             end                 module name
    04de0000 0529f000   nvd3dum
        Loaded symbol image file: nvd3dum.dll
        CompanyName:      NVIDIA Corporation
        ProductName:      NVIDIA Windows Vista WDDM driver
        InternalName:     NVD3DUM
        OriginalFilename: NVD3DUM.DLL
        FileDescription:  NVIDIA Compatible Vista WDDM
              D3D Driver, Version 158.28 

    Ideally, this sort of JIT’d code would be reprotected to be readonly after it is generated. While not as severe as leaving the entire process without NX, any “NX holes” like this are to be avoided when possible. (Simply slapping a “PAGE_EXECUTE_READWRITE” on all of your allocations and calling it done is not the proper solution, and compromises security to a certain extent, albeit significantly less so than disabling NX entirely.)

  3. id Software’s Quake 3 crashes out immediately if you have NX enabled. I suppose it can be given a bit of slack given its age, but the SDK documentation has always said that you need to protect executable regions as executable.

Programs that have been “recently rescued” from the No-Execute hall of shame include:

  1. DosBox, the DOS emulator (great for running those old games on new 64-bit computers, or even on 32-bit computers where DosBox has superior support for hardware, such as sound cards, compared to NTVDM. The current release version (0.70) of DosBox doesn’t work with NX if you have the dynamic code generation core enabled. However, after speaking to one of the developers, it turns out that a version that properly protects executable memory was already in the works (in fact, the DosBox team was kind enough to give me a prerelease build with the fix in it, in leui of my trying to get a DosBox build environment working). Now I’m free to indulge in those oldie-but-goodie classic DOS titles like “Master of Magic” from time to time without having to mess around with excluding DosBox.exe from the NX policy.
  2. Blizzard Entertainment’s World of Warcraft used to crash immediately on logging on to the game world if you had NX enabled. It seems as if this has been fixed for the most recent patch level, though.
  3. Another of Blizzard’s games, Starcraft, will crash with NX enabled unless you’ve got a recent patch level installed. The reason is that Starcraft’s rendering engine JITs various rendering operations into native code on the fly for increased performance. Until relatively recently in Starcraft’s lifetime, none of this code was NX-aware and blithely outputted non-executable rendering code which it then tried to run. In fact, Blizzard’s Warcraft II: Edition has not been patched to repair this deficiency and is thus completely unusable with NX enabled as a result of the same underlying problem.

If you’re writing a program that generates code on the fly, please use the proper protection attributes for it – PAGE_READWRITE during generation, and PAGE_EXECUTE_READONLY after the code is ready to use. Windows Server already ships with NX enabled by default (although in “opt-out” mode), and it’s already a matter of time before client SKUs of Windows do the same.

Much ado about signed drivers, and yet…

Wednesday, June 13th, 2007

Today, I was installing a Srv03 x64 VM for testing purposes. This is something I’ve done countless times before – install/reinstall/blow away Windows VMs for testing purposes with the standard Windows setup, nothing all that interesting. However, I ran into something extremely bizzare this time around:

Windows Server 2003 x64 Setup - Unsigned Driver Warning

The really weird thing here is that this VM was completely clean; brand new, empty hard disk which had only been just now connected to standard installation media (no third party OEM drivers slipstreamed onto the media or anything either).

Apparently, there’s an unsigned “in-box” driver lurking on the Srv03 x64 installation discs. Nice. The only thing that I can think of that was different about this VM from all the others I’ve made is that I created a serial port (redirected to a named pipe as usual) in the VM before setup, instead of after, and that somehow Windows might have thought that I had one of those old-style serial-port-controlled UPS batteries connected to the box. Still, I’ve been around enough Srv03 installs on “real metal” to know that it’s weird for this to happen; I’ve never seen it on another install, VM or physical hardware.

Given the fuss that Microsoft makes about signed drivers, seeing what appears to be an in-box driver that isn’t signed is, to say the least, amusing. (Note that there are two types of signing; cat file signing, to sign the installation package – this is what suppresses the unsigned driver popup – and driver binary signing, which allows a driver to load on an x64 system but still generates the unsigned driver popup if you install the driver via an INF. I can only assume that for whatever reason, this driver doesn’t have a valid catalog signature, or it wouldn’t load at all. I’ll see if I can confirm this when the VM finishes installing, though…)

Selectively suppress Wow64 filesystem redirection on Vista x64 with ‘Sysnative’

Friday, June 8th, 2007

One of the features implemented by the Wow64 layer on x64 editions of Windows for 32-bit programs is filesystem redirection, which operates similarly to registry redirection in that it provides a Wow64 “sandbox”, for file I/O relating to the System32 directory.

The reasoning behind this is that many programs have the “System32” directory name hardcoded, and more importantly, unlike the transition between 16-bit Windows and Win32, virtually all of the DLLs implementing the Windows API retained their same name (and path) across the 32-bit to 64-bit transition.

Clearly, this creates a conflict, as there can’t be both 32-bit and 64-bit DLLs with the same name and path on the same system at the same time. The solution that Microsoft devised, filesystem redirection, is a layer that intercepts accesses to the system32 directory made by 32-bit applications, and re-points the accesses to the SysWOW64 directory, which functions as the Wow64 version of the system32 directory (where most of the core Wow64 system DLLs reside).

Normally, this works fairly well, but there are times when a program needs to access the real System32 directory. For this purpose, Microsoft has provided a set of APIs to enable and disable the filesystem redirection on a per-thread basis. This is all well and dandy, but sometimes you may find yourself needing to turn off Wow64 filesystem redirection in a place where you can’t modify a program easily.

Until Vista, there was no easy way to do this. Fortunately, Vista adds a “backdoor” to Wow64 filesystem redirection, in the form of a pseudo-directory named “Sysnative” that is present under the Windows directory (e.g. C:\Windows\Sysnative). This pseudo-directory is not visible in directory listings, and does not exist for native 64-bit processes. For Wow64 processes, however, it can be used to access the 64-bit system32 directory.

Why would you need to do that in the first place? Well, there are lots of reasons, most of them dealing with things like where there is only a 64-bit version of a program that you need to launch (or the 64-bit version is necessary, such as if you need to communicate between a 32-bit process and a 64-bit process by launching another 64-bit process or something of the sort). Prior to Vista, situations like this were a pain, as there was no built-in way to access the 64-bit system32 directory. The “sysnative” pseudo-directory allows the problem to be addressed in a general fashion without requiring code changes, as Wow64 programs can address the 64-bit system32 via the special path (of course, this does require the user supplying such a path, but there is little working around this given the fact that 64-bit and 32-bit system DLLs have identical names).

Sysnative is not a universal solution, however. Some functionality does not work well with it, as sysnative does not appear in directory listings; for instance, the common controls open/save file dialogs won’t allow you to navigate through sysnative due to this issue.

Why doesn’t the publicly available kernrate work on Windows x64? (and how to fix it)

Monday, June 4th, 2007

Previously, I wrote up an introduction to kernrate (the Windows kernel profiler). In that post, I erroneously stated that the KrView distribution includes a version of kernrate that works on x64. Actually, it supports IA64 and x86, but not x64. A week or two ago, I had a problem on an x64 box of mine that I wanted to help track down using kernrate. Unfortunately, I couldn’t actually find a working kernrate for Windows x64 (Srv03 / Vista).

It turns out that nowhere is there a published kernrate that works on any production version of Windows x64. KrView ships with an IA64 version, and the Srv03 resource kit only ships with an x86 version only. (While you can run the x86 version of kernrate in Wow64, you can’t use it to profile kernel mode; for that, you need a native x64 version.)

A bit of digging turned up a version of kernrate labeled ‘AMD64’ in the Srv03 SP0 DDK (the 3790.1830 / Srv03 SP1 DDK doesn’t include kernrate at all, only documentation for it, and the 6000 WDK omits even the documentation, which is really quite a shame). Unfortunately, that version of kernrate, while compiled as native x64 and theoretically capable of profiling the x64 kernel, doesn’t actually work. If you try to run it, you’ll get the following error:

NtQuerySystemInformation failed status c0000004

So, no dice even with the Srv03 SP0 DDK version of kernrate, or so I thought. Ironically, the various flavors of x86 (including 3790.0) kernrate I could find all worked on Srv03 x64 SP1 (3790.1830) / Vista x64 SP0 (6000.0), but as I mentioned earlier, you can’t use the profiling APIs to profile kernel mode from a Wow64 program. So, I had a kernrate that worked but couldn’t profile kernel mode, and a kernrate that should have worked and been able to profile kernel mode except that it bombed out right away.

After running into that wall, I decided to try and get ahold of someone at Microsoft who I suspected might be able to help, Andrew Rogers from the Windows Serviceability group. After trading mails (and speculation) back and forth a bit, we eventually determined just what was going on with that version of kernrate.

To understand what the deal is with the 3790.0 DDK x64 version of kernrate, a little history lesson is in order. When the production (“RTM”) version of Windows Server 2003 was released, it was supported on two platforms: x86, and IA64 (“Windows Server 2003 64-bit”). This continued until the Srv03 Service Pack 1 timeframe, when support for another platform “went gold” – x64 (or AMD64). Now, this means that the production code base for Srv03 x64 RTM (3790.1830) is essentially comparable to Srv03 x86 SP1 (3790.1830). While normally, there aren’t “breaking changes” (or at least, these are tried to kept to a minimum) from service pack to service pack, Srv03 x64 is kind of a special case.

You see, there was no production / RTM release of Srv03 x64 3790.0, or a “Service Pack 0” for the x64 platform. As a result, in a special case, it was acceptable for there to be breaking changes from 3790.0/x64 to 3790.1830/x64, as pre-3790.1830 builds could essentially be considered beta/RC builds and not full production releases (and indeed, they were not generally publicly available as in a normal production release).

If you’re following me this far, you’re might be thinking “wait a minute, didn’t he say that the 3790.0 x86 build of kernrate worked on Srv03 3790.1830?” – and in fact, I did imply that (it does work). The breaking change in this case only relates to 64-bit-specific parts, which for the most part excludes things visible to 32-bit programs (such as the 3790.0 x86 kernrate).

In this particular case, it turns out that part of the SYSTEM_BASIC_INFORMATION structure was changed from the 3790.0 timeframe to the 3790.1830 timeframe, with respect to x64 platforms.

The 3790.0 structure, as viewed from x64 builds, is approximately like this:

typedef struct _SYSTEM_BASIC_INFORMATION_3790 {
    ULONG Reserved;
    ULONG TimerResolution;
    ULONG PageSize;
    ULONG_PTR NumberOfPhysicalPages;
    ULONG_PTR LowestPhysicalPageNumber;
    ULONG_PTR HighestPhysicalPageNumber;
    ULONG AllocationGranularity;
    ULONG_PTR MinimumUserModeAddress;
    ULONG_PTR MaximumUserModeAddress;
    KAFFINITY ActiveProcessorsAffinityMask;
    CCHAR NumberOfProcessors;

However, by the time 3790.1830 (the “RTM” version of Srv03 x64 / XP x64) was released, a subtle change had been made:

	ULONG Reserved;
	ULONG TimerResolution;
	ULONG PageSize;
	ULONG NumberOfPhysicalPages;
	ULONG LowestPhysicalPageNumber;
	ULONG HighestPhysicalPageNumber;
	ULONG AllocationGranularity;
	ULONG_PTR MinimumUserModeAddress;
	ULONG_PTR MaximumUserModeAddress;
	KAFFINITY ActiveProcessorsAffinityMask;
	CCHAR NumberOfProcessors;

Essentially, three ULONG_PTR fields were “contracted” from 64-bits to 32-bits (by changing the type to a fixed-length type, such as ULONG). The cause for this relates again to the fact that the first 64-bit RTM build of Srv03 was for IA64, and not x64 (in other words, “64-bit Windows” originally just meant “Windows for Itanium”, something that is to this day still unfortunately propagated in certain MSDN documentation).

According to Andrew, the reasoning behind the difference in the two structure versions is that those three fields were originally defined as either a pointer-sized data type (such as SIZE_T or ULONG_PTR – for 32-bit builds), or a 32-bit data type (such as ULONG – for 64-bit builds) in terms of the _IA64_ preprocessor constant, which indicates an Itanium build. However, 64-bit builds for x64 do not use the _IA64_ constant, which resulted in the structure fields being erroneously expanded to 64-bits for the 3790.0/x64 builds. By the time Srv03 x64 RTM (3790.1830) had been released, the page counts had been fixed to be defined in terms of the _WIN64 preprocessor constant (indicating any 64-bit build, not just an Itanium build). As a result, the fields that were originally 64-bits long for the prerelease 3790.0/x64 build became only 32-bits long for the RTM 3790.1830/x64 production build. Note that due to the fact the page counts are expressed in terms of a count of physical pages, which are at least 4096 bytes for x86 and x64 (and significantly more for large physical pages on those platforms), keeping them as a 32-bit quantity is not a terribly limiting factor on total physical memory given today’s technology, and that of the foreseeable future, at least relating to systems that NT-based kernels will operate on).

The end result of this change is that the SYSTEM_BASIC_INFORMATION structure that kernrate 3790.0/x64 tries to retrieve is incompatible with the 3790.1830/x64 (RTM) version of Srv03, hence the call failing with c0000004 (otherwise known as STATUS_INFO_LENGTH_MISMATCH). The culimination of this is that the 3790.0/x64 version of kernrate will abort on production builds of Srv03, as the SYSTEM_BASIC_INFORMATION structure format is incompatible with the version kernrate is expecting.

Normally, this would be pretty bad news; the program was compiled against a different layout of an OS-supplied structure. Nonetheless, I decided to crack open kernrate.exe with IDA and HIEW to see what I could find, in the hopes of fixing the binary to work on RTM builds of Srv03 x64. It turned out that I was in luck, and there was only one place that actually retrieved the SYSTEM_BASIC_INFORMATION structure, storing it in a global variable for future reference. Unfortunately, there were a great many references to that global; too many to be practical to fix to reflect the new structure layout.

However, since there was only one place where the SYSTEM_BASIC_INFORMATION structure was actually retrieved (and even more, it was in an isolated, dedicated function), there was another option: Patch in some code to retrieve the 3790.1830/x64 version of SYSTEM_BASIC_INFORMATION, and then convert it to appear to kernrate as if it were actually the 3790.0/x64 layout. Unfortunately, a change like this involves adding new code to an existing binary, which means that there needs to be a place to put it. Normally, the way you do this when patching a binary on-disk is to find some padding between functions, or at the end of a PE section that is marked as executable but otherwise unused, and place your code there. For large patches, this may involve “spreading” the patch code out among various different “slices” of padding, if there is no one contiguous block that is long enough to contain the entire patch.

In this case, however, due to the fact that the routine to retrieve the SYSTEM_BASIC_INFORMATION structure was a dedicated, isolated routine, and that it had some error handling code for “unlikely” situations (such as failing a memory allocation, or failing the NtQuerySystemInformation(…SystemBasicInformation…) call (although, it would appear the latter is not quite so “unlikely” in this case), a different option presented itself: Dispense with the error checking, most of which would rarely be used in a realistic situation, and use the extra space to write in some code to convert the structure layout to the version expected by kernrate 3790.0. Obviously, while not a completely “clean” solution per se, the idea does have its merits when you’re already into patch-the-binary-on-disk-land (which pretty much rules out the idea of a “clean” solution altogether at that point).

The structure conversion is fairly straightforward code, and after a bit of poking around, I had a version to try out. Lo and behold, it actually worked; it turns out that the only thing that was preventing the prerelease 3790.0/x64 kernrate from doing basic kernel profiling on production x64 kernels was the layout of the SYSTEM_BASIC_INFORMATION structure.

For those so inclined, the patch I created effectively rewrites the original version of the function, as shown below after translated to C:

 NTSTATUS                       Status;


 if (!Sbi)
  fprintf(stderr, "Buffer allocation failed"
   " for SystemInformation in "

 if (!NT_SUCCESS((Status = NtQuerySystemInformation(
  fprintf(stderr, "NtQuerySystemInformation failed"
   " status %08lx\\n",

  Sbi = 0;

 return Sbi;

Conceptually represented in C, the modified (patched) function would appear something like the following (note that the error checking code has been removed to make room for the structure conversion logic, as previously mentioned; the substantial new changes are colored in red):

(Note that the patch code was not compiler-generated and is not truly a function written in C; below is simply how it would look if it were translated from assembler to C.)


 Sbi     = (PSYSTEM_BASIC_INFORMATION_3790)malloc(


 Sbi->NumberOfProcessors           =
 Sbi->ActiveProcessorsAffinityMask =
 Sbi->MaximumUserModeAddress       =
 Sbi->MinimumUserModeAddress       =
 Sbi->AllocationGranularity        =
 Sbi->HighestPhysicalPageNumber    =
 Sbi->LowestPhysicalPageNumber     =
 Sbi->NumberOfPhysicalPages        =

 return Sbi;

(The structure can be converted in-place due to how the two versions are laid out in memory; the “expected” version is larger than the “real” version (and more specifically, the offsets of all the fields we care about are greater in the “expected” version than the “real” version), so structure fields can safely be copied from the tail of the “real” structure to the tail of the “expected” structure.)

If you find yourself in a similar bind, and need a working kernrate for x64 (until if and when Microsoft puts out a new version that is compatible with production x64 kernels), I’ve posted the patch (assembler, with opcode diffs) that I made to the “amd64” release of kernrate.exe from the 3790.0 DDK. Any conventional hex editor should be sufficient to apply it (as far as I know, third parties aren’t authorized to redistribute kernrate in its entirety, so I’m not posting the entire binary). Note that DDKs after the 3790.0 DDK don’t include any kernrate for x64 (even the prerelease version, broken as it was), so you’ll also need the original 3790.0 DDK to use the patch. Hopefully, we may see an update to kernrate at some point, but for now, the patch suffices in a pinch if you really need to profile a Windows x64 system.

What’s the difference between the Wow64 and native x86 versions of a DLL?

Friday, June 1st, 2007

Wow64 includes a complete set of 32-bit system DLLs implementing the Win32 API (for use by Wow64 programs). So, what’s the difference between the native 32-bit DLLs and their Wow64 versions?

On Windows x64, not much, actually. Most of the DLLs are actually the 32-bit copies from the 32-bit version of the operating system (not even recompiled). For example, the Wow64 ws2_32.dll on Vista x64 is the same file as the native 32-bit ws2_32.dll on Vista x86. If I compare the wow64 version of ws2_32.dll on Vista with the x86 native version, they are clearly identical:

32-bit Vista:

C:\\Windows\\System32>md5sum ws2_32.dll
d99a071c1018bb3d4abaad4b62048ac2 *ws2_32.dll

64-bit Vista (Wow64):

C:\\Windows\\SysWOW64>md5sum ws2_32.dll
d99a071c1018bb3d4abaad4b62048ac2 *ws2_32.dll

Some DLLs, however, are not identical. For instance, ntdll is very different across native x86 and Wow64 versions, due to the way that Wow64 system calls are translated to the 64-bit user mode portion of the Wow64 layer. Instead of calling kernel mode directly, Wow64 system services go through a translation layer in user mode (so that the kernel does not need to implement 32-bit and 64-bit versions of each system service, or stubs thereof in kernel mode). This difference can clearly be seen when comparing the Wow64 ntdll with the native x64 ntdll, with respect to a system service:

If we look at the native x86 ntdll, we see the expected call through the SystemCallStub pointers in SharedUserData:

0:000> u ntdll!NtClose
mov     eax,30h
mov     edx,offset SharedUserData!SystemCallStub
call    dword ptr [edx]
ret     4

However, an examination of the Wow64 ntdll shows something different; a call is made through a field at offset +C0 in the 32-bit TEB:

0:000> u ntdll!NtClose
mov     eax,0Ch
xor     ecx,ecx
lea     edx,[esp+4]
call    dword ptr fs:[0C0h]
ret     4

If we examine the 32-bit TEB, this field as marked as “WOW32Reserved” (kind of a misnomer, since it’s actually reserved for the 32-bit portion of Wow64):

0:000> dt ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x0c0 WOW32Reserved    : Ptr32 Void

As I had previously mentioned, this all leads up to a transition to 64-bit mode in user mode in the Wow64 case. From there, the 64-bit portion of the Wow64 layer reformats the request into a 64-bit system call, makes the actual kernel transition, then repackages the results into what a 32-bit system call would return.

Thus, one case where a Wow64 DLL will necessarily differ is when that DLL contains a system call stub. The most common case of this is ntdll, of course, but there are a couple of other modules with inline system call stubs (user32.dll and gdi32.dll are the most common other modules, though winsrv.dll (part of CSRSS) also contains stubs, however it has no Wow64 version).

There are other cases, however, where Wow64 DLLs differ from their native x86 counterparts. Usually, a DLL that makes an LPC (“Local Procedure Call”) call to another component also has a differing version in the Wow64 layer. This is because most of the LPC interfaces in 64-bit Windows have been “widened” to 64-bits, where they included pointers (most notably, this includes the LSA LPC interface for use in implementing code that talks to LSA, such as LogonUser or calls to authentication packages). As a result, modules like advapi32 or secur32 differ on Wow64.

This has, in fact, been the source of some rather insidious bugs where mistakes have been made in the translation layer. For instance, if you call LsaGetLogonSessionData on Windows Server 2003 x64, from a Wow64 process, you may unhappily find out that if your program accesses any fields beyond “LogonTime” in the SECURITY_LOGON_SESSION_DATA structure returned by LsaGetLogonSessionData, it will crash. This is the result of a bug in the Wow64 version of Secur32.dll (on Windows Server 2003 x64 / Windows XP x64), where part of the structure was not properly converted to its 32-bit counterpart after the 64-bit version is read “off the wire” from an LSA LPC call. This particular problem has been fixed in Windows Vista x64 (and can be worked around with some clever hackery if you take a look at how the conversion layer in Secur32 works for Srv03 x64). Note that as far as I know from having spoken to PSS, Microsoft has no plans to fix the bug for pre-Vista platforms (and it is still broken in Srv03 x64 SP2), so if you are affected by that problem then you’ll have to live with implementing a hackish workaround if you can’t rebuild your app as native 64-bit (the problem only occurs for 32-bit processes on Srv03 x64, not 32-bit processes on Srv03 32-bit, or 64-bit processes on Srv03 x64).

There are a couple of other corner cases which result in necessary differences between Wow64 modules and their native 32-bit counterparts, though these are typically few and far between. Most notably absent in the list of things that require code changes for a Wow64 DLL are modules that directly talk to 64-bit drivers via IOCTLs. For example, even “low-level” modules such as mswsock.dll (the AFD Winsock2 Tcpip service provider which implements the user mode to kernel mode translation layer from Winsock2 service provider calls to AFD networking calls in kernel mode) that communicate heavily with drivers via IOCTLs are unchanged in Wow64. This is due to direct support for Wow64 processes in the kernel IOCTL interface (requiring changes to drivers to be aware of 32-bit callers via IoIs32bitProcess), such that 32-bit and 64-bit processes can share the same IOCTL codes even though such structures are typically “widened” to 64-bits for 64-bit processes.

Due to this approach to Wow64, most of the code that implements the 32-bit Win32 API on x64 computers is identical to their 32-bit counterparts (the same binary in the vast majority of cases). For DLLs that do require a rebuild, typically only small portions (system call stubs and LPC stubs in most cases) require any code changes. The major benefits to this design are savings with respect to QA and development time for the Wow64 layer, as most of the code implementing the Win32 API (at least in user mode) did not need to be re-engineered. That’s not to say that Microsoft hasn’t tested it, but it’s a far cry from having to modify every single module in a non-trivial way.

Additionally, this approach has the added benefit of preserving the vast majority of implementation details that have slowly sept into the Win32 programming model, as much as Microsoft has tried to prevent such from happening. Even programs that use undocumented system calls from NTDLL should generally work on the 64-bit version of that operating system via the Wow64 layer, for instance.

In a future posting later on down the line, I’ll be expanding on this high level overview of Wow64 and taking a more in depth look at the guts of Wow64 (of course, subject to change between OS releases without warning, as usual).