Archive for July, 2007

Why you shouldn’t touch Change­Window­Message­Filter with a 10-ft pole…

Tuesday, July 31st, 2007

One of the things introduced with Windows Vista is the concept of something called “user interface privilege isolation”, or an attempt to allow multiple processes to coexist on one desktop even if they are running at different privilege levels, without compromising security. This new change to the security architecture with respect to how the user interface operates is also a significant part of how UAC and Internet Explorer Protected Mode can claim be to be reasonably secure, despite displaying user interfaces from differing security contexts on the same desktop.

(For reference, the Windows GUI model was designed back in the 16-bit cooperative multitasking days, where every task (yes, task – there weren’t processes or threads in Windows in those days) completely trusted every other task. Although things have improved somewhat since then, there are a number of assumptions that are so fundamental to how the Windows GUI model works that it is virtually impossible to eliminate all communication between GUI programs via window messages and still have the system function, which leads to many of the difficulties in securing user interface communications in Windows. As a result, the desktop has traditionally been considered the “security barrier” in terms of the Windows UI, such that processes of differing privilege levels should be isolated on their own desktops. Of course, since only one desktop is displayed locally at a time, this is less than convenient from an end user’s perspective.)

Most of the changes that were made in Vista relate to restrictions on just how processes from different integrity levels or user accounts can communicate eachother using the window messaging system. Window messages are typically split up into several different categories:

  1. System marshalled messages, or messages that come before WM_USER. These are messages that for compatibility with 16-bit Windows (and convenience), are automagically marshalled cross process. For example, this is why you can send the LB_ADDSTRING message to a window located in a different process, even though that message takes a pointer to a string which would obviously not be valid in the remote address space. The windowing system understands the semantics of all system marshalled messages and can thus, as the name implies, marshal them cross-process. This also means that in many circumstances, the system can also perform parameter validation in the cross process case, so that for instance a null pointer passed to LB_ADDSTRING won’t cause the remote process to crash. The system marshalled range also includes messages like WM_DESTROY, WM_QUIT, and the like. Most of the “built in” controls like the Edit and ListBox controls are “grandfathered in” to the system marshalled range, for compatibility reasons, though the newer common controls are not specially handled like this.
  2. The private window class message range, from WM_USER to 0x7FFF. These messages are specific to a particular custom window class (note that common controls, such as Rich Edit, are “custom” window classes even though you might think of them as being built in to the operating system; the windowing system ostensibly has no special knowledge of these controls, unlike the “built-in” controls like ListBox or Edit). Because the format and semantics of these messages are specific to a program-supplied window class, the window manager cannot interpret, marshal, or validate these parameters cross-process.
  3. The private application message range, from WM_APP to 0xBFFF. These window messages are specific to a program (and not a window class), though in practice it is very common for programmers to incorrectly interchange WM_USER and WM_APP for custom window classes that are internal to an application. These window messages are again completely opaque to the window manager and are essentially treated the same as private window class messages. Their intended use is to allow things like application customized subclasses of window classes to communicate with eachother (e.g. in the case where the application hooks a window procedure).
  4. The registered message range, from 0xC000 to 0xFFFF. Window messages here are dynamically assigned similarly to how atoms work in Windows. Specifically, a program passes an arbitrary string to the RegisterWindowMessage routine, which hands the application back a value in the registered message range. Any other program calling RegisterWindowMessage in the same session will receive the same window message value. These messages are, like class- and application- defined messages, opaque to the system. They are used when programs need to communicate cross-process with custom window messages, without the use of some sort of central registry where window messages would have to be permanently registered with Microsoft in order to receive a unique, non-conflicting identifier. By using an arbitrary string value (easy to make unique) and dynamically reserving a numeric message identifier at runtime, no registry is required and programs following the “standard” for that window message can still communicate with eachother. Registered window messages are not marshalled or interpreted by the windowing system.

Now, in Windows Vista, only a subset of the system marshalled messages can be sent cross process when the two processes have differnet integrity levels, and this subset of messages is heavily validated by the system, such that if a program receives a message in the system marshalled range, it can ostensibly “trust” it. Since the windowing system cannot validate custom messages (in any of the other three categories), these are all silently dropped by default in this scenario. This is typically a good thing (consider that many common control messages have pointers in their contracts and lay in the WM_USER range, making it very bad for an untrusted program to be able to send them to a privileged application). However, sometimes, one does need to send custom messages cross process. Vista provides a mechanism for this in the Change­Window­Message­Filter function, which is essentially a way to “poke a hole” in the window message “firewall” that exists between cross-integrity-level processes in Vista.

Now, this might seem like a great approach at first – after all, you’ll only use Change­Window­Message­Filter when you’re sure you can completely validate a received message even if it is from an untrusted source, such that there’s no way something could go wrong, right?

Well, the problem is that even if you do this, you are often opening your program up to attack unintentionally. Consider for a moment how custom window messages are typically used; virtually all the common controls in existance have “dangerous” messages in the custom class message range (e.g. WM_USER and friends). Additionally, many programs and third party libraries confuse WM_USER and WM_APP, such that you may have programs communicating cross process via both WM_USER and WM_APP, via “dangerous” messages that are used to make sensitive decisions or include pointer parameters.

This means that in reality, you can’t really use Change­Window­Message­Filter for a specific window message unless you are absolutely sure that nobody else in your process is listening for that message and can’t be exploited if they receive a malformed (or specially crafted) message. Right away, this pretty much excludes all WM_USER messages, and even WM_APP is highly questionably in my opinion due to how frequently components mix up WM_USER and WM_APP.

Well, that’s still not so bad, is it? After all, you can just ensure that any third party modules you use don’t do stupid things with custom window messages if you’ve got source code to them, right? Well, aside from the fact that that is in reality a pretty implausible scenario, the problem is even worse. There is rampant use of Change­Window­Message­Filter in operating system shipped libraries that your program is already using on Vista, which you have absolutely no control over. For instance, if one looks at Shell32.dll with a disassembler in Vista, one might see this in, say, the SHChangeNotifyRegister function:

mov     edx, 1          ; MSGFLT_ADD
mov     ecx, 401h       ; WM_USER + 1
call    cs:__imp_ChangeWindowMessageFilter

In other words, SHChangeNotifyRegister just promised to the window message firewall that everybody in the entire process fully validates the custom class message WM_USER + 1. Yow! The WM_USER range is the worst of any to be doing that on, especially a low WM_USER value because practically everything that uses a custom window class will use a WM_USER message for something, and the changes of a collision with the set of all custom window classes goes extremely much up that close to the start of the custom window class range. Now, just for kicks, let’s take a look at the SDK headers and see if any of the built in common controls have any interesting messages that match WM_USER + 1, or 0x401:

AclUI.h(142):#define PSPCB_SI_INITDIALOG (WM_USER + 1)
CommCtrl.h(1458):#define TB_ENABLEBUTTON (WM_USER + 1)
CommCtrl.h(2562):#define TTM_ACTIVATE (WM_USER + 1)
CommCtrl.h(6230):#define CBEM_INSERTITEMA (WM_USER + 1)
CommDlg.h(765):#define WM_CHOOSEFONT_GETLOGFONT (WM_USER + 1)
[…]

Let’s look at the documentation for some of those window messages in MSDN. Hmm, there’s CBEM_INSERTITEM[A|W], which takes, in lParam, “A pointer to a COMBOBOXEXITEM structure…”. Oh, and there’s also WM_CHOOSEFONT_GETLOGFONT, which uses lParam for a “pointer to a LOGFONT structure…”. Hmm, let me see. Anybody who calls SHChangeNotify is promising that they are not using any ComboBoxEx controls, any choosefont controls, or any of the other many hits for dangerous WM_USER + 1 messages that I didn’t list for space reasons. Oh, and that’s just the built-in controls – what happens if there’s a third party control thrown in there? What if we’re running in Internet Explorer and there’s a custom ActiveX control showing its own user interface there, blissfully unaware that some other code in the process called SHChangeNotify. That means that anybody, anywhere, who calls SHChangeNotify on Vista (or uses a library or function that calls SHChangeNotify internally, which is probably not going to be documented at all being an implementation detail) just reinvented the shatter attack with their program (congratulations!), probably without even realizing it – how would they, if they didn’t take the time to disassemble the API instead of trusting that it just works?

Now, I might be coming off a bit harsh on Microsoft here, but that’s kind of my point. Microsoft puts a lot of effort into security, and they’re the ones who designed and implemented the new the new security improvements on Vista. Sadly, this is hardly an isolated incident in Vista – with a little bit of looking, it’s very easy to find numerous other examples of system libraries that are loaded in used and called all over the place making this sort of error, which represents in my opinion a fundamental lack of understanding of how user interface security works.

Now, if the company that designed and implemented Change­Window­Message­Filter is using it wrong, how many third party developers out there that just want to get their program working under Vista in the quickest way possible with the minimum amount of effort and money spent will do the right thing? MSDN doesn’t even document this entire class of problems with Change­Window­Message­Filter that I can tell, so to be honest I think I can be fairly confident and say “virtually nobody”. It takes someone with a fairly good understanding of how the window messaging system impacts security to recognize and grasp this problem, and that only includes people who are even thinking about security in the first place when they see a function like Change­Window­Message­Filter, which I’m betting are already the vast minority.

This function is one that is pretty much all but impossible to use correctly aside from just maybe messages in the registered message range, which are already expected to be cross process and and not fully trusted (and I’m still skeptical that people will even on average get it right even in that case).

There are almost certainly other loopholes in the UIPI architecture that have yet to be discovered, if for no other reason than that it is a security bolt-on to an architecture that was designed for a single shared address space in a cooperative multitasking system. I wouldn’t say that this is so much the fault of the UIPI folks, but rather a fact that it’s just going to be ridiculously hard to make it completely safe to run programs with multiple privilege levels on the same desktop. And, remember, the “good guys” have to get it right 100% of the time from a security perspective, while the “bad guys” only need to find that one case out of 1000 that got missed in order to break the system.

So, do yourself a favor and stick to the desktop as a security barrier; the 16-bit Windows-derived window messaging system was just not designed to support programs at different privilege levels on the same desktop.

Be careful about using the built-in “low privilege” service accounts…

Wednesday, July 25th, 2007

One of the security enhancements in the Windows XP and Windows Server 2003 timeframe was to move a number of the built-in services that ship with the OS to run as a more restricted user account than LocalSystem. Specifically, two new built-in accounts akin to LocalSystem were introduced exclusively for use with services: The local service and network service accounts. These are essentially slightly more powerful than plain user accounts, but not powerful enough such that a compromise will mean the entire system is a write-off.

The intention here was to reduce the attack surface of the system as a whole, such that if a service that is running as LocalService or NetworkService is compromised, then it cannot be used to take over the system as a whole.

(For those curious, the difference between LocalService and NetworkService is only evident in domain scenarios. If the computer is joined to a domain, LocalService authenticates as a guest on the network, while NetworkService (like LocalSystem) authenticates as the computer account.)

Now, reducing the amount of code running as LocalSystem is a great thing pretty much all around, but there are some sticking points with the way the two built-in service accounts work that aren’t really covered in the documentation. Specifically, that there are a whole lot of other services that run as either LocalService or NetworkService nowadays, and by virtue of the fact that they all run as the same security context they can be compromised as one unit. In other words, if you compromise one LocalService process, you can attack all other LocalService processes, because they are running under the same security context.

Think about that for a minute. That effectively means that the attack surface of any LocalService process can in some sense be considered the sum of the attack surface of all LocalService processes on the same computer. Moreover, that means that as you offload more and more services to run as LocalService, the problem gets worse. (Although, it’s still better than the situation when everybody ran as LocalSystem, certainly.)

Windows Vista improves on this a little bit; in Vista, LocalService and NetworkService processes do have a little bit of protection from eachother, in that each service instance is assigned a unique SID that is marked as the owner for the process object (even though the process is running as LocalService or NetworkService). Furthermore, the default DACL for processes running as LocalService or NetworkService only grants access to administrators and the service-unique SID. This means that in Vista, one compromised LocalService process can’t simply use OpenProcess and WriteProcessMemory (or the like) to take complete control over another service process in Vista.

You can easily see this in action in the kernel debugger. Here’s what things look like in Vista:

kd> !process fffffa80022e0c10
PROCESS fffffa80022e0c10
[...]
    Token   fffff88001e3e060
[...]
kd> !token fffff88001e3e060
_TOKEN fffff88001e3e060
TS Session ID: 0
User: S-1-5-19
Groups:
[...]
 10 S-1-5-5-0-107490
    Attributes - Mandatory Default Enabled Owner LogonId 
[...]
kd> !object fffffa80022e0c10
Object: fffffa80022e0c10  Type: (fffffa8000654840) Process
    ObjectHeader: fffffa80022e0be0 (old version)
    HandleCount: 5  PointerCount: 96
kd> dt nt!_OBJECT_HEADER fffffa80022e0be0
[...]
   +0x028 SecurityDescriptor : 0xfffff880`01e14c26 
kd> !sd 0xfffff880`01e14c20
->Revision: 0x1
[...]
->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[0]: ->AceFlags: 0x0
->Dacl    : ->Ace[0]: ->AceSize: 0x1c
->Dacl    : ->Ace[0]: ->Mask : 0x001fffff
->Dacl    : ->Ace[0]: ->SID: S-1-5-5-0-107490

->Dacl    : ->Ace[1]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[1]: ->AceFlags: 0x0
->Dacl    : ->Ace[1]: ->AceSize: 0x18
->Dacl    : ->Ace[1]: ->Mask : 0x00001400
->Dacl    : ->Ace[1]: ->SID: S-1-5-32-544

Looking at winnt.h, we can see that S-1-5-5-X-Y corresponds to a logon session SID. In Vista, each LocalService/NetworkService service process gets its own logon session SID.

By making the process owned by a different user than it is running as, and not allowing access to the user that the service is running as (but instead the logon session), the service is provided some measure of protection against processes in the same user context. This may not provide complete protection, though, as in general, any securable objects such as files or registry keys that contain an ACE matching against LocalService or NetworkService will be at the mercy of all such processes. To Microsoft’s credit, however, the default DACL in the token for such LocalService/NetworkService services doesn’t grant GenericAll to the user account for the service, but rather the service SID (another concept that is unique to Vista and future systems).

Furthermore, it seems like many of the ACLs that previously referred to LocalService/NetworkService are being transitioned to use service SIDs instead, which may again over time make LocalService/NetworkService once again viable, after all the third party software in the world that makes security decisions on those two SIDs is updated (hmm…), and the rest of the ACLs that refer to the old generalized SIDs that have fallen through the cracks are updated (check out AccessEnum from SysInternals to see where those ACLs have slipped through the cracks in Vista – there are at least a couple of places in WinSxS that mention LocalService or NetworkService for write access in my machine, and that isn’t even considering the registry or the more ephemeral kernel object namespace yet).

In Windows Server 2003, things are pretty bleak with respect to isolation between LocalService/NetworkService services. Service processes have direct access to eachother, as shown by their default security descriptors. The default security descriptor doesn’t allow direct access, but does allow one to rewrite it to grant oneself access as the owner field matches LocalService:

lkd> !process fffffadff39895c0 1
PROCESS fffffadff3990c20
[...]
    Token                             fffffa800132b9e0
[...]
lkd> !token fffffa800132b9e0
_TOKEN fffffa800132b9e0
TS Session ID: 0
User: S-1-5-19
Groups:
[...]
 07 S-1-5-5-0-44685
    Attributes - Mandatory Default Enabled LogonId
[...] 
lkd> !object fffffadff3990c20
Object: fffffadff3990c20  Type: (fffffadff4310a00) Process
    ObjectHeader: fffffadff3990bf0 (old version)
    HandleCount: 3  PointerCount: 21
lkd> dt nt!_OBJECT_HEADER fffffadff3990bf0
[...]
   +0x028 SecurityDescriptor : 0xfffffa80`011441ab 
[...]
lkd> !sd 0xfffffa80`011441a0
->Revision: 0x1
->Sbz1    : 0x0
->Control : 0x8004
            SE_DACL_PRESENT
            SE_SELF_RELATIVE
->Owner   : S-1-5-19
[...]
->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[0]: ->AceFlags: 0x0
->Dacl    : ->Ace[0]: ->AceSize: 0x1c
->Dacl    : ->Ace[0]: ->Mask : 0x001f0fff
->Dacl    : ->Ace[0]: ->SID: S-1-5-5-0-44685

->Dacl    : ->Ace[1]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[1]: ->AceFlags: 0x0
->Dacl    : ->Ace[1]: ->AceSize: 0x14
->Dacl    : ->Ace[1]: ->Mask : 0x00100201
->Dacl    : ->Ace[1]: ->SID: S-1-5-18

Again looking at winnt.h, we clearly see that S-1-5-19 is LocalService. So, there is absolutely no protection at all from one compromised LocalService process attacking another, at least in Windows Server 2003.

Note that if you are marked as the owner of an object, you can rewrite the DACL freely by requesting a handle with WRITE_DAC access and then modifying the DACL field with a function like SetKernelObjectSecurity. From there, all you need to do is re-request a handle with the desired access, after modifying the security descriptor to grant yourself said access. This is easy to verify experimentally by writing a test service that runs as LocalService and requesting WRITE_DAC in an OpenProcess call for another LocalService service process.

To make matters worse, nowadays most services run in shared svchost processes, which means if one process in that svchost is compromised, the whole process is a write off.

I would recommend seriously considering using dedicated unique user accounts for your services in certain scenarios as a result of this unpleasant mess. In the case where you have a security sensitive service that doesn’t need high privileges (i.e. it doesn’t require LocalSystem), it is often the wrong thing to do to just stuff it in with the rest of the LocalService or NetworkService services due to the vastly increased attack surface over running as a completely isolated user account, even if setting up a unique user account is a pain to do programmatically.

Note that although Vista attempts to mitigate this problem by ensuring that LocalService/NetworkService services cannot directly interfere with eachother in the most obvious sense of opening eachother’s processes and writing code into eachother’s address spaces, this is really only a small measure of protection due to the problem that one LocalService process’s data files are at the mercy of every other LocalService process out there. I think that it would be extremely unwise to stake your system security on there being no way to compromise one LocalService process from another in Vista, even with its mitigations; it may be slightly more difficult, but I’d hardly write it off as impossible.

Given all of this, I would steer clear of NetworkService and LocalService for sensitive but unprivileged processes (and yes, I would consider such a thing a real scenario, as you don’t need to be a computer administrator to store valuable data on a computer; you just need there to not be an untrusted (or compromised) computer administrator on the box).

One thing I am actually kind of curious about is what the SWI rationale is for even allowing the svchost paradigm by default, given how it tends to (negatively, from the perspective of system security) multiply the attack surface of all svchost’d processes. Using svchosts completely blows away the security improvements Vista makes to LocalService / NetworkService, as far as I can tell. Even though there are some services that are partitioned off in their own svchosts, there’s still one giant svchost group in Vista that has something on the order of like ~20 services in it (ugh!). Not to mention that svchosts make debugging a nightmare, but that’s a topic for another posting.

Update: Andrew Rogers pointed out that I originally posted the security descriptor for a LocalSystem process in the Windows Server 2003 example, instead of for a LocalService process. Whoops! It actually turns out that contrary to what I originally wrote, the DACL on LocalService processes on Windows Server 2003 doesn’t explicitly allow access to LocalService, but LocalService is still named as the object owner, so it is trivial to gain that access anyway, as previously mentioned (at least for Windows Server 2003).

Debugger tricks: Break on a specific Win32 last error value in Windows Vista

Tuesday, July 24th, 2007

Often times, one type of problem that you might want to track down in a debugger (aside from a crash) is a particular function failing in a certain way. In the case of most Win32 functions, you’ll often get some sort of (hopefully meaningful) last error code. Sometimes you might need to know why that error is returned, or where it originated from (in the case of a last error value that is propagated up through several functions).

One way you might approach this is with a conditional breakpoint, but the SetLastError path is typically frequently hit, so this is often be problematic in terms of performance, even in user mode debugging on the local computer.

On Windows Vista, there is an undocumented hook inside of NTDLL (which is now responsible for the bulk of the logic behind SetLastError) that allows you to configure a program to break into the debugger when a particular error code is being set as the last error. This is new to Vista, and as it is not documented (at least not anywhere that I can see), it might not be around indefinitely.

For the moment, however, you can set ntdll!g_dwLastErrorToBreakOn to a non-zero value (via the ed command in the debugger) to ask NTDLL to execute a breakpoint when it sees that last error value being set. Obviously, this won’t catch things that modify the field in the TEB directly, but anything using SetLastError or RtlSetLastWin32Error will be checked against this value (in the context of the debuggee).

For example, you might see something like this if you ask NTDLL to break on error 5 (ERROR_ACCESS_DENIED) and then try to open a file or directory that you don’t have access to:

0:002> ed ntdll!g_dwLastErrorToBreakOn 5
0:002> g

[...] Perform an operation to cause ERROR_ACCESS_DENIED

(1864.2774): Break instruction exception
  - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00000000`76d6fdf0 cc              int     3
0:004> k
Call Site
ntdll!DbgBreakPoint
ntdll! ?? ::FNODOBFM::`string'+0x377b
kernel32!BaseSetLastNTError+0x16
kernel32!CreateFileW+0x325
SHELL32!CEnumFiles::InitAndFindFirst+0x7a
SHELL32!CEnumFiles::InitAndFindFirstRetry+0x3e
SHELL32!CFileSysEnum::_InitFindDataEnum+0x5e
SHELL32!CFileSysEnum::Init+0x135
SHELL32!CFSFolder::EnumObjects+0xd3
SHELL32!_GetEnumerator+0x189
SHELL32!CEnumThread::_RunEnum+0x6d
SHELL32!CEnumThread::s_EnumThreadProc+0x13
SHLWAPI!WrapperThreadProc+0xfc
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

(The debugger is slightly confused about symbol names in NTDLL due to the binary being reorganized into function chunks, but “ntdll! ?? ::FNODOBFM::`string’+0x377b” is part of ntdll!RtlSetLastWin32Error.)

Sometimes, it can be useful to add “debugger knobs” like this to your program that can be used to enable special diagnostics behavior that might be useful while debugging something. Several other components provide options like this; for example, there’s a global variable in NTDLL named ntdll!ShowSnaps that you can set to 1 in order to enable a large volume of debug print spew about the symbol import resolution process when the loader is resolving imported modules and symbols.

(Incidentally, debugger-settable global variables like ntdll!ShowSnaps are a good example of a correct way of using debug prints in release builds, though there are certainly many other good ways to do so.)

Update: Andrew Rogers points out that g_dwLastErrorToBreakOn existed on Srv03 as well, though it was resident in kernel32 (kernel32!g_dwLastErrorToBreakOn) and not NTDLL in that timeframe. As the last error logic was finally moved entirely to NTDLL in the Vista timeframe, so was the last error breakpoint hook.

Update: Pavel Lebedinsky points out that I neglected to mention that as a consequence of the internal BaseSetLastNTError routine in kernel32 on Srv03 not going through kernel32!SetLastError, the functionality available in Srv03 is generally much less useful (only catches things external to kernel32) than in Vista. Which, to be clear, is more my fault in not making this point known and not Andrew getting it wrong.

An introduction to DbgPrintEx (and why it isn’t an excuse to leave DbgPrints on by default in release builds)

Saturday, July 21st, 2007

One of the things that was changed around the Windows XP era or so in the driver development world was the introduction of the DbgPrintEx routine. This routine was introduced to combat the problem of debug spew from all sorts of different drivers running together by allowing debug prints to be filtered by a “component id”, which is supposed to be unique per class of driver. By allowing a user to filter which debug prints are being displayed by driver, on the host-side instead of the debugger side, a better debugging experience can be provided when the system is fairly “crowded” as far as debug prints go. This is especially common on checked builds of the operating system.

Additionally, DbgPrintEx provides an additional mechanism to filter output host-side – a severity level. The way the filtering system works is that each component has an associated set of allowed severity levels, such that only a message with a severity level that is “allowed” will actually be transmitted to the debugger when the DbgPrintEx call is made. This allows debug prints to be filtered on a (component, severity) basis, such that especially verbose debug prints can be turned off at runtime without requiring a rebuild or patching the binary on the fly (which might be a problem if the binary in question is the kernel itself and you’re running a 64-bit build). In order to edit the allowed severity of a component at runtime, you typically modify one of the nt!Kd_<component>_Mask global variables in the kernel with the debugger, setting the global corresponding to the desired component to the desired severity mask.

With respect to older drivers, the old DbgPrint call still works, but it is essentially repackaged into a DbgPrintEx call, with hardcoded (default) component and severity values. This means that you’ll still be able to get output from DbgPrint, but (obviously) you can’t take advantage of the extra host-side filtering. Host-side filtering is much preferable to .ofilter-style filtering (which occurs debugger-side), as debug prints that are transferred “over the wire” to the debugger incur a significant performance penalty – the system is essentially suspended while the transfer occurs, and if you have a large amount of debug print spew, this can quickly make the system unusably slow.

Windows Vista takes this whole mechanism one step further and turns off plain DbgPrint calls by default, by setting the default severity level for the DbgPrint-assigned component value such that DbgPrint calls are not transmitted to the debugger. This can be overridden at runtime by modifying Kd_DEFAULT_Mask, but it’s an extra step that must be taken (and one that may be confusing if you don’t know about the default behavior change in Vista, as your debug prints will seemingly just never work).

However, just because DbgPrintEx now provides a way to filter debug prints host-side, that doesn’t mean that you can just go and turn on all your debug prints for release builds by default. Among other things, it’s possible that someone else is using the same component id as your driver (remember, component ids are for classes of devices, such as “network driver”, or “video driver”). Furthermore, DbgPrintEx calls still do incur some overhead, even if they aren’t transmitted to the debugger on the other end (however as long as the debug print is masked off by the allowed severities mask for your component, the overhead is fairly minimal).

Still, the problem of limited component ids remains a significant enough one that you don’t want to turn on debug prints always, or if someone else with the same component id wants to debug their driver, they’ll have all of your debug print spew mixed in with their own.

Also, there is an option to turn on all debug prints, which can sometimes come in handy, and if every driver in the system has debug prints on by default, this often results in a lot of badness. This can be accomplished by specifying a global severity mask via the nt!Kd_WIN2000_Mask global, which is checked before any component specific masks. (If Kd_WIN2000_Mask allows the debug print, it is short-circuited to being allowed without considering component-specific masks. This makes things easier if you want to grab certain severities of debug print messages from many components at the same time, without having to go and manually poke around with severity levels on every component you’re intersted in.)

Unfortunately, this is already a problem on Vista RTM (even free builds) – there are a couple of in-box Microsoft drivers that are guilty of this sort of thing, making Kd_WIN2000_Mask less than useful on Vista. Specifically, cdrom.sys likes to print debug messages like this every second:

Will not retry; Sense/ASC/ASCQ of 02/3a/00

That’s hardly the worst of it, though. Try starting a manifested program on Vista with all debug print severities turned on in Kd_WIN2000_Mask and you’ll get pages of debug spew (no, I’m not kidding – try it with iexplore.exe and you’ll see what I mean). In that respect, shame on the SxS team for setting a bad example with polluting the world with debug prints that are useless to most of us (and to a lesser extent, cdrom.sys also gets a “black lump of coal”, so to speak). Maybe these two components will be fixed for Srv08 RTM or Vista SP1 RTM, if we’re lucky.

So, take this opportunity to check and make sure that none of your drivers ship with DbgPrints turned on by default – even if you do happen to use DbgPrintEx. Somebody who has to debug a problem on a computer with your driver is installed will be all the more happier as a result.

Debugger tricks: API call logging, the quick’n’dirty way (part 3)

Friday, July 20th, 2007

Previously, I introduced several ways to use the debugger to log API calls. Beyond what was described in that article, there are some other, more complicated examples that are worth reviewing. Additionally, there are certain limitations that should be considered when using the debugger instead of a dedicated API logging program.

Although logging breakpoints like I’ve previously described (i.e. displaying function input parameters and return values) are certainly handy, you’ve probably already come up with a couple of scenarios where breakpoints in the style like I’ve provided won’t give you what you need to track down a problem.

The most notable example of this is when you need to examine an out parameter that is filled by a function call, after the function call is made. This provides a problem, as it’s generally not reliable to access the function parameters on the stack after the function call has returned (in both stack and register based calling conventions in use on Windows, the called function is free to modify the parameter locations as it sees fit, and this is actually fairly common with optimizations enabled). As a result, what we really need is the ability to save some state across the function call, so that we can access some of the function’s arguments after the function returns.

Fortunately, this is doable within the debugger, albeit in a rather roundabout way. The key here is the usage of so-called user-defined pseudo-registers, which are conceptually extra platform-independent storage locations (accessed like regular registers in terms of the expression evaluator, hence the term pseudo-register). These pseudo-registers are essentially just variables in the conventional programming sense, although there are a limited number of them available (20 in the current release). As a result, there are some limitations on what can be accomplished using them, but for most circumstances, 20 is enough. If you find yourself needing to track more state than that, you should strongly consider writing a debugger extension in C instead of using the debugger script language.

(As an aside, at Driver DevCon a couple of years ago, I remember sitting in on a WinDbg-oriented session in which the presenter was at one point going over a large program written in the (then-relatively-new) expanded debugger scripting language, with additional support for conditionals and error handling. I still can’t but help think of debugger-script programs as combining the ugliest parts of Perl with cmd.exe-style batch scripts (although to be fair, the debugger expression evaluator is a bit more powerful than batch scripts, and it was also never originally intended to be used for more than simple expressions). To be honest, I would still strongly recommend against writing highly complex debugger-script programs where possible; they are something of a maintenance nightmare, among other things. For such circumstances, writing a debugger extension (or a program to drive the debugger entirely) is a better choice. I digress, however; back to the subject of call logging.)

The debugger’s user-defined pseudo-register facility provides an effective (if perhaps slightly awkward) means of storing state, and this can be used to save parameter values across a function call. For example, we might want to log all calls to ReadFile, such that we want a dump of the file data being read in. To accomplish this task, we’ll need to dump the contents of the output buffer (and use the bytes transferred count, another out parameter). This could be accomplished like so (in this case, for brevitiy, I am assuming that the program is using ReadFile in synchronous I/O mode):

0:000> bp kernel32!ReadFile "r @$t0 = poi(@esp+8) ; r @$t1 = poi(@esp+10) ; g @$ra ; .if (@eax != 0) { .printf \"Read %lu bytes: \\n\", dwo(@$t1) ; db @$t0 ldwo(@$t1) } .else { .echo Read failed! ; !gle } ; g "

The output of this command might be like so:

Read 22 bytes: 
0016ec3c  54 68 69 73 20 69 73 20-61 20 74 65 78 74 20 66
              This is a text f
0016ec4c  69 6c 65 2e 0d 0a
              ile...

(Awkward wrapping done by me to avoid breaking the blog layout.)

This command is essentially a logical extension of yesterday’s example, with the addition of some state that is shared across the call. Specifically, the @$t0 and @$t1 user-defined pseudo-registers are used to save the lpBuffer ([esp+08h]) and lpNumberOfBytesRead ([esp+10h]) arguments to the ReadFile call across the function’s execution. When execution is stopped at the return address, the contents of the file data that were just read are dumped by dereferencing the values referred to by @$t0 and @$t1.

Although this sort of state-saving across execution can be useful, there are downsides. Firstly, this sort of breakpoint is fundamentally incompatible with multiple threads (at least in as much as multiple threads hitting the breakpoint in question simultaneously). This is because the debugger provides no provision for “expression-local”, or “thread-local” state – multiple threads hitting the breakpoint at the same time can step on eachothers toes, so to speak. (This problem can also occur with any sort of breakpoint that involves resuming execution until an implicit breakpoint created by a “g <address>” command, although it is arguably more severe with “stateful” breakpoints.)

This limitation in the debugger can be worked around in a limited fashion by making a breakpoint thread-specific via a thread specifier in the g command, although this is typically hardly convenient to do. Many call logging programs will account for multithreading natively and will not require any special work to accomodate multithreaded function calls. (Note that this problem is often not as severe as it might sound – in many cases, even in multithreaded programs, there is typically only one function that calls a function you’re interested in, or the liklihood of a thread collision is sufficiently small that it works anyway the vast majority of the time. However, in some circumstances, these style of breakpoints just do not work well if the function in question is called frequently from many threads and requires inspection of data after the function returns.)

Another significant limitation of using the debugger to do call logging (as opposed to a dedicated program) is that the debugger is typically very slow compared to a dediated program doing logging. The reason here is that for every breakpoint event, essentially all threads in the program are frozen, various state information is copied from the program to the debugger, and then the breakpoint expression is evaluated debugger-side. Additionally, unlike with a dedicated program, the results of the logging breakpoint are displayed in real time, instead of (say) being stored in a binary log buffer somewhere for later format and display. This means that even more overhead is incurred as the debugger UI needs to be updated on every breakpoint. As a result, if you set a conditional breakpoint on a frequently hit function, you may notice the program slow down significantly, perhaps even to the point of being unusable. Dedicated logging programs can employ a variety of techniques to circumvent these limitations of the debugger, which are primarily artifacts of the fact that the debugger is primarily designed to be a debugger and not a high-speed API monitor.

This is even more noticible in the kernel debugger case, as transitions to the debugger in in KD mode are very slow, such that even several transitions per second is enough to make a system all but unusable in practical terms. As a result, one needs to be extra careful in picking locations to set conditional logging breakpoints at in the kernel debugger (perhaps placing them in the middle of a function, in a specific interesting code path, rather than at the start so that all calls will be caught).

Given these limitations, it is worth doing a bit of analysis on the problem to determine if the debugger or a dedicated logging program is the best choice. Both approaches have strengths and weaknesses, and although the debugger is extremely flexible (and often very convenient), it isn’t necessarily the best choice in every conceivable scenario. In other words, use the best tool for the job. However, there are some circumstances where the only option is to use the debugger, such as kernel-mode call logging, so I would recommend at least having some basic knowledge of how to accomplish logging tasks with the debugger, even if you would normally always use a dedicated logging program. (Although, in the case of kernel mode debugging, again, the slowness of debugger transitions makes it important to pick “low-traffic” locations to breakpoint on.)

Still, an important part of being effective at debugging and solving problems is knowing your options and when (and when not) to use them. Using the debugger to perform call logging should just be one of many such options in your “debugging toolkit”.

Debugger tricks: API call logging, the quick’n’dirty way (part 2)

Thursday, July 19th, 2007

Last time, I put forth the notion that WinDbg (and the other DTW debuggers) are a fairly decent choice for API call logging. This article expands on just how to do this sort of logging via the debugger, by starting out with a simple logging breakpoint and expanding on it to be more intelligent.

As previously mentioned, it is really not all that difficult to use the debugger to perform call logging. The basic idea involved is to just set a “conditional” breakpoint (e.g. via the bp command) at the start of a function you’re interested in. From there, the breakpoint can have commands to display input parameters. However, you can also get a bit more clever in some scenarios (e.g. displaying return values, values in output parameters, and the like), although there are some limitations to this that may or may not be a problem based on the characteristics of the program that you’re debugging.

To give a simple example of what I mean, there’s the classic “show all files opened via Win32 CreateFile as they are opened”. In order to do this, the way to go is to set a breakpoint on kernel32!CreateFileW. (Remember that most of the “A” Win32 APIs thunk to the “W” APIs, so you can often set a breakpoint on just the “W” version to get both. Of course, this is not always true (and some bizzare APIs like WinInet actually thunk “W” to “A”), but as a general rule of thumb, it’s more often the case than not.) The kernel32 breakpoint needs to be imbued with the knowledge of how to display the first argument based on the calling convention of the routine in question. Since CreateFile is __stdcall, that would be [esp+4] (for x86), and rcx (for x64).

At it’s most basic, the breakpoint command might look like so:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; gc"

(Note that the gc command is similar to g, except that it is designed especially for use in conditional breakpoints. If you trace into a breakpoint that controls execution with gc, it will resume executing the same way the user was controlling the program instead of unconditionally resuming normally. The difference between a breakpoint using g and one using gc is that if you trace into a gc breakpoint, you’ll trace to the next instruction, whereas if you trace into a g breakpoint, control will resume full speed and you’ll lose your place.)

The debugger output for this breakpoint (when hit) lists the names passed to kernel32!CreateFileW, like so (if I were setting this breakpoint in cmd.exe, and then did “type C:\readme.txt”, this might come up in the debugger output):

00657ff0  "C:\\readme.txt"

(Note that as the breakpoint displays the string passed to the function, it will be a relative path if the program uses the relative path.)

Of course, we can do slightly more complicated things as well. For instance, it might be a good idea to display the returned handle and the last error code. This could be done by having the breakpoint go to the return point of the function after it dumps the first parameter, and then display the additional information. To do this, we might use the following breakpoint:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; g @$ra ; !handle @eax f ; !gle ; g"

The gist of this breakpoint is to display the returned handle (and last error status) after the function returns. This is accomplished by directing the debugger to resume execution until the return address is hit, and then operate on the return value (!handle @eax f) and last error status (!gle). (The @$ra symbol is a pseudo-register that refers to the current function’s return address in a platform-independent fashion. Essentially, the g @$ra command runs the program until the return address is hit.)

The output from this breakpoint might be like so:

0016f0f0  "coffbase.txt"
Handle 60
  Type         	File
  Attributes   	0
  GrantedAccess	0x120089:
         ReadControl,Synch
         Read/List,ReadEA,ReadAttr
  HandleCount  	2
  PointerCount 	3
  No Object Specific Information available
LastErrorValue: (Win32) 0 (0) - The operation
  completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0

However, if we fail to open the file, the results are less than ideal:

00657ff0  "c:\\readme.txt"
Handle 4
  Type         	Directory
[...] enumeration of all handles follows [...]
21 Handles
Type           	Count
Event          	3
File           	4
Directory      	3
Mutant         	1
WindowStation  	1
Semaphore      	2
Key            	6
Thread         	1
LastErrorValue: (Win32) 0x2 (2) - The system
  cannot find the file specified.
LastStatusValue: (NTSTATUS) 0xc0000034 -
  Object Name not found.

What went wrong? Well, the !handle command expanded into essentially “!handle -1 f“, since CreateFile returned INVALID_HANDLE_VALUE (-1). This mode of the !handle extension enumerates all handles in the process, which isn’t what we want. However, with a bit of cleverness, we can improve upon this. A second take at the breakpoint might look like so:

0:001> bp kernel32!CreateFileW "du poi(@esp+4) ; g @$ra ; .if (@eax != -1) { .printf \"Opened handle %p\\n\", @eax ; !handle @eax f } .else { .echo Failed to open file, error: ; !gle } ; g"

Although that command might appear a bit intimidating at first, it’s actually fairly straightfoward. Like with the previous version of this breakpoint, the essence of what it accomplishes is to display the filename passed to kernel32!CreateFileW, and then resumes execution until CreateFile returns. Then, depending on whether the function returned INVALID_HANDLE_VALUE (-1), either the handle is displayed or the last error status is displayed. The output from the improved breakpoint might be something like this (with an example of successfully opening a file and then failing to open a file):

Success:

0016f0f0  "coffbase.txt"
Opened handle 00000060
Handle 60
  Type         	File
  Attributes   	0
  GrantedAccess	0x120089:
         ReadControl,Synch
         Read/List,ReadEA,ReadAttr
  HandleCount  	2
  PointerCount 	3
  No Object Specific Information available

Failure:

00657ff0  "C:\\readme.txt"
Failed to open file, error:
LastErrorValue: (Win32) 0x2 (2) - The system
  cannot find the file specified.
LastStatusValue: (NTSTATUS) 0xc0000034 -
  Object Name not found.

Much better. A bit of intelligence in the breakpoint allowed us to skip the undesirable behavior of dumping the entire process handle table in the failure case, and we could even skip displaying the last error code in the success case.

As one can probably imagine, there’s a whole other range of possibilities here once one considers the flexbility offered by conditional breakpoints. However, there are some downsides with this approach that must be considered as well. More on a couple of other more advanced condition breakpoints for logging purposes in a future posting (as well as a careful look at some of the limitations and disadvantages of using the debugger instead of a specialized program, and some “gotchas” you might run into with this sort of approach).

Debugger tricks: API call logging, the quick’n’dirty way (part 1)

Wednesday, July 18th, 2007

One common task that you may be faced with while debugging a problem is to log information about calls to a function or function(s). While if you want to know about a function in your program that you have source code to, you could often just add some sort of debug print and rebuild the program, sometimes this isn’t practical. For example, you might not always be able to reproduce a problem and so it might not viable to have to restart with a debug-ified build because you might blow away your repro. Or, more importantly, you might need to log calls to functions that you don’t have source code to (or aren’t building as part of your program, or otherwise don’t want to modify).

For example, you might want to log calls to various Windows APIs in order to gain information about a problem that you are troubleshooting. Now, depending on what you’re doing, you might be able to do this by adding debug prints before and after every single call to the particular API. However, this is often less than convenient, and if you aren’t the immediate caller of the function you want to log, then you’re not going to be able to take that route anyway.

There are a number of API spy/API logging packages out there (and the Debugging Tools for Windows distribution even ships with one, called Logger, though it tends to be fairly fragile – personally, I’ve had it crash out on me more often than I’ve had it actually work). Although you might be able to use one of those, a big limitation of “shrink-wrapped” logging tools is that they won’t know how to properly log calls to custom functions, or functions that are otherwise not known to the logging tool. The better logging tools out there are user-extensible to a certain extent, in that they typically provide some sort of scripting- or programmming- language that allows the user (i.e. you) to describe function parameters and calling conventions, so that they can be logged.

However, it can often be difficult (or even impossible) to describe many types of functions to these tools – such as functions that contain pointers to structures that contain pointers to other structures, or other such non-trivial constructs. As a result, for many circumstances, I tend to recommend to not use so-called “shrink-wrapped” API logging tools in situations where I want to log calls to functions.

In the event that it’s not a feasible solution to implement debug prints in source code, though, it would appear on the surface that this leaves one without a usable solution for logging calls. Not so, in fact – it turns out that with some careful use of so-called “conditional breakpoints”, you can often use the debugger (e.g. WinDbg/ntsd/cdb/kd, which is what I shall be referring to for the rest of this article) to provide this sort of call logging. Using the debugger has many advantages; for instance, you can do this sort of API logging “on the fly”, and in situations where you can attach the debugger after the process has started, you don’t even need to start the program specially in order to log it. Even better, however, is that the debugger has extensive support for displaying data in meaningful forms to the user.

If you think about it, displaying data to the user is one of the prinicpal functions of the debugger, in fact. It’s also one of the major reasons why the debugger is highly extensible via extensions, such that complicated data structures can be displayed and interpreted in a meaningful fashion. By using the debugger to perform your API logging, you can take advantage of the rich functionality for displaying data that is already baked into the debugger (and its extensions, and even any custom extensions of your own that you have written) to double as a call logging facility.

Even better, because the debugger can read and display many data types in a meaningful fashion based off of symbol files (if you have private symbols, such as for programs you compile or provide), for data types that don’t have specific debugger extensions for displaying them (like !handle, !error (for error codes), !devobj, and soforth), you can often utilize the debugger’s ability to format data based off of type information in symbols. This is typically done via the dt command, and often provides a workable display for most custom data types without having to do any sort of complicated “training” like you might have to do with a logging program. (Some data structures, such as trees and lists may need some more intelligence than what is provided in dt for displaying all parts of the data structure. This is typically true for “container” data types, although even for those types, you can still often use dt to display actual members within the container in a meaningful fashion.) Utilizing the information contained within symbols files (via the debugger) for API logging also frees you from having to ensure that your logging program’s definitions for all of your structures and other types are in synch with the program you are debugging, as the debugger automagically receives the correct definitions based on symbols (and if you are using a symbol server that includes indexed versions of your own internal symbols, the debugger will even be able to find the symbols on its own).

Another plus to this approach is that, provided you are reasonably familiar with the debugger, you probably won’t have to learn a new description language like you might if you were using an API logging program. This is because you’re probably already familiar with many of the commands the debugger makes available for displaying data, from every-day debugger usage. (Even if you aren’t all that familiar with the debugger, there is extensive documentation that ships with the debugger by default which describes how to format and display data via various debugger commands. Additionally, there are many examples describing how to use most of the important or useful debugger commands out there on the Internet.)

Okay, enough about why you might want to consider using the debugger to perform call logging. Next time, a quick look and walkthrough describing how you can do this (it’s really quite simple, as alluded to previously), along with some caveats and gotchas that you might want to watch out for along the way.

Silly debugger tricks: Using KD to reset a forgotten administrator password

Wednesday, July 11th, 2007

One particularly annoying occurance that’s happened to me on a couple of occasions is losing the password to a long-forgotten test VM that I need to thaw for some reason or another, months from the last time I used it. (If you follow good password practices and use differing passwords for accounts, you might find yourself in this position.)

Normally, you’re kind of sunk if you’re in this position, which is the whole idea – no administrator password is no administrative access to the box, right?

The officially supported solution in this case, assuming you don’t have a password reset disk (does anyone actually use those?) is to reformat. Oh, what fun that is, especially if you just need to grab something off of a test system and be done with it in a few minutes.

Well, with physical access (or the equivalent if the box is a VM), you can do a bit better with the kernel debugger. It’s a bit embarassing having to “hack” (and I use that term very loosely) into your own VM because you don’t remember which throwaway password you used 6 months ago, but it beats waiting around for a reformat (and in the case of a throwaway test VM, it’s probably not worth the effort anyway compared to cloning a new one, unless there was something important on the drive).

(Note that as far as security models go, I don’t really think that this is a whole lot of a security risk. After all, to use the kernel debugger, you need physical access to the system, and if you have that much, you could always just use a boot CD, swap out hard drives, or a thousand other different things. This is just more convenient if you’ve got a serial cable and a second box with a serial port, say a laptop, and you just want to reset the password for an account on an existing install.)

This is, however, perhaps an instructive reminder in how much access the kernel debugger gives you over a system – namely, the ability to do whatever you want, like bypass password authentication.

The basic idea behind this trick is to use the debugger to disable the password cheeck used at interactive logon inside LSA.

The first step is to locate the LSA process. The typical way to do this is to use the !process 0 0 command and look for a process name of LSASS.exe. The next step requires that we know the EPROCESS value for LSA, hence the enumeration. For instance:

kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS fffffa80006540d0
    SessionId: none  Cid: 0004    Peb: 00000000
      ParentCid: 0000
    DirBase: 00124000  ObjectTable: fffff88000000080
      HandleCount: 545.
    Image: System
[...]
PROCESS fffffa8001a893a0
    SessionId: 0  Cid: 025c    Peb: 7fffffda000
     ParentCid: 01ec
    DirBase: 0cf3e000  ObjectTable: fffff88001b99d90
     HandleCount: 822.
    Image: lsass.exe

Now that we’ve got the LSASS EPROCESS value, the next step is to switch to it as the active process. This is necessary as we’re going to need to set a conditional breakpoint in the context of LSA’s address space. For this task, we’ll use the .process /p /r eprocess-pointer command, which changes the debugger’s process context and reloads user mode symbols.

kd> .process /p /r fffffa8001a893a0
Implicit process is now fffffa80`01a893a0
.cache forcedecodeuser done
Loading User Symbols
.....

Next, we set up a breakpoint on a particular internal LSA function that is used to determine whether a given password is accepted for a local account logon. The breakpoint changes the function to always return TRUE, such that all local account logons will succeed if they get to the point of a password check. After that, execution is resumed.

kd> ba e1 msv1_0!MsvpPasswordValidate
   "g @$ra ; r @al = 1 ; g"
kd> g

We can dissect this breakpoint to understand better just what it is doing:

  • Set a break on execute hardware breakpoint on msv1_0!MsvpPasswordValidate. Why did I use a hardware breakpoint? Well, they’re generally more reliable when doing user mode breakpoints from the kernel debugger, especially if what you’re setting a breakpoint on might be paged out. (Normal breakpoints require overwriting an instruction with an “int 3”, whereas a hardware breakpoint simply programs an address into the processor such that it’ll trap if that address is accessed for read/write/execute, depending on the breakpoint type.)
  • The breakpoint has a condition (or command) attached to it. Specifically, this command runs the target until it returns from the current function (“g @$ra” continues the target until the return address is hit. @$ra is a special platform-independent psueod-register that refers to the return address of the ccurrent function.) Once the function has returned, the al register is set to 1 and execution is resumed. This function returns a BOOLEAN value (in other words an 8-bit value), which is stored in al (the low 8 bits of the eax or rax register, depending on whether you’re on x86 or x64). IA64 targets don’t store return values in this fashion and so the breakpoint is x86/x64-specific.

Now, log on to the console. Make sure to use a local account and not a domain account, so the authentication is processed by the Msv1_0 package. Also, non-console logons might not run through the Msv1_0 package, and may not be affected. (For example, Network Level Authentication (NLA) for RDP in Vista/Srv08 doesn’t seem to use Msv1_0, even for local accounts. The console will still allow you to log in, however.)

From there, you can simply reset the password for your account via the Computer Management console. Be warned that this will wipe out EFS keys and the like, however. To restore password checking to normal, either reboot the box without the kernel debugger, or use the bc* command to disable the breakpoint you set.

(For the record, I can’t really take credit for coming up with this trick, but it’s certainly one I’ve found handy in a number of scenarios.)

Now, one thing that you might take away from this article, from a security standpoint, is that it is important to provide physical security for critical computers. To be honest, if someone really wants to access a box they have physical access to, this is probably not even the easist way; it would be simplere to just pop in a bootable CD or floppy and load a different operating system. As a result, as previously mentioned, I wouldn’t exactly consider this a security hole as it already requires you to have physical access in order to be effective. It is, however, a handy way to reset passwords for your own computers or VMs in a pinch if you happen to know a little bit about the debugger. Conversely, it’s not really a supported “solution” (more of a giant hack at best), so use it with care (and don’t expect PSS to bail you out if you break something by poking around in the kernel debugger). It may break without warning on future OS versions (and there are many cases that won’t be caught by this trick, such as domain accounts that use the Kerberos provider to process authentication).

Update: I forgot to mention the very important fact that you can turn on the kernel debugger from the “F8” boot menu when booting the system, even if you don’t have kernel debugging enabled in the boot configuration or boot.ini. This will enable kernel debugging on the highest numbered COM port, at 19200bps. (On Windows Vista, this also seems capable of auto-selecting 1394 if your machine had a 1394 port, if memory serves. I don’t know offhand whether that translates to downlevel platforms, though.)

The beginning of the end of the single-processor era

Tuesday, July 10th, 2007

I came across a quote on CNet that stuck with me yesterday:

It’s hard to see how there’s room for single-core processors when prices for nearly half of AMD’s dual-core Athlon 64 X2 chips have crept well below the $100 mark.

I think that that this sentiment is especially true nowadays (at least for conventional PC-style computers – not counting embedded things). Multiprocessor (at least pseudo-multiprocessor, in the form of Intel’s HyperThreading) has been available on end-user computers for some time now. Furthermore, full multiprocessor, in terms of multi-core chips, is now mainstream. What I mean by that is that by now, most of that computers you’ll get from Dell, Best Buy, and the likes will be MP, whether via HyperThreading or multi-core.

To give you an idea, I recently got a 4-way server (a single quad core chip) recently, for ~$2300 or so (though it was also reasonably equipped other than in the CPU department). At work, we got an 8-way box (2x dual core chips) for under under ~$3000 or so as well, for running VMs for our quality assurance department. Just a few years ago, getting an 8-way box “just like that” would have been unheard of (and ridiculously expensive), and yet here we are, with medium-level servers that Dell ships coming with that kind of multiprocessing “out of the box”.

Even laptops are coming with multicore chips in today’s day and age, and laptops have historically not been exactly performance leaders due to size, weight, and battery life constraints. All but the most entry-level laptops Dell ships nowadays are dual core, for instance (and this is hardly limited to Dell either; Apple is shipping dual-core Intel Macs as well for their laptop systems, and has been for some time in fact.)

Microsoft seems to have recognized this as well; for instance, there is no single processor kernel shipping with Windows Vista, Windows Server 2008, or future Windows versions. That doesn’t mean that Windows doesn’t support single processor systems, but just that there is no longer an optimized single processor kernel (e.g. replacing spinlocks with a simple KeRaiseIrql(DISPATCH_LEVEL) call) anymore. The reason is that for new systems, of which are expected to be the vast majority of Vista/Server 2008 installs, multiprocessing capability is just so common that it’s not worth maintaining a separate kernel and HAL just for the odd single processor system’s benefit anymore.

What all this means is that if as developers, you haven’t been really paying attention to the multiprocessor scene, now’s the time to start – it’s a good bet that within a few years, even on very low end systems, single processor boxes are going to become very rare. For intensive applications, the capability to take advantage of MP is going to start being a defining point now, especially as chip makers have realized that they can’t just indefinitely increase clock rates and have accordingly began to transition to multiprocessing as an alternative way to increase performance.

Microsoft isn’t the only company that’s taking notice of MP becoming mainstream, either. For instance, VMware now fully supports multiprocessor virtual machines (even on its free VMware Server product), as a way to boost performance on machines with true multiprocessing capability. (And to their credit, it’s actually not half-bad as long as you aren’t loading the processors down completely, at which point it seems to turn into a slowdown – perhaps due to VMs competing with eachother for scheduling while waiting on spinlocks, though I didn’t dig in deeper.)

(Sorry if I sound a bit like Steve when talking about MP, but it really is here, now, and now’s the time to start modifying your programs to take advantage of it. That’s not to say that we’re about to see 100+ core computers becoming mainstream tommorow, but small-scale multiprocessing is very rapidly becoming the standard in all but the most low cost systems.)

Reversing the V740, part 4: Implementing a solution

Friday, July 6th, 2007

In the previous post in this series, I described some of the functionality in place in the V740’s abstraction module for the Verizon connection manager app, and the fact that as it was linked to a debug build of the Novatel SDK, reversing relevant portions of it would be (relatively) easy (especially due to the numerous debug prints hinting at function names throughout the module).

As mentioned last time, while examining the WmcV740 module, I came across some functions that appeared as if they might be of use (and one in particular, Diag_Call_End, that assuming my theory panned out, would instruct the device to enter dormant mode – and from there, potentially reacquiring an EVDO link if available).

However, several obstacles remained in the way: First, there were no callers for this function in particular, potentially complicating the process of determining valid input arguments if the purpose of any arguments were not immediately obvious from the function implementation. Second, the function in question wasn’t in the export table of the DLL, so there existed no (clean) way to resolve its address.

The first problem turned out to be a fairly trivial one, as very basic analysis of the function determined that it didn’t even take any arguments. It does use some global state, however that global state is already initialized by exported functions to initialize the abstraction layer module, meaning that the function itself should be fairly straightforward to use.

From an implementation standpoint, the function looked much like many of the other diagnostics routines shipped with the Novatel SDK. The function is essentially a very thin wrapper around the communications protocol used to talk to the device firmware, and doesn’t really add a lot of “value” on top of that, other than managing the transmission of a request to the firmware and the reception of a response. In pseudocode, the function is roughly laid out as follows:

Diag_Call_End()
{
  DebugPrint(severity, "Diag_Call_End: Begin\\n");

  acquire-lock;
  pre-send-serial-port-setup;

  initialize-packet;

  //
  // Set the packet opcode.  All other
  // packet parameters are defaults.
  //
  TxPacket.Cmd = NVTL_CMD::DIAG_CALL_END;

  //
  // Transmit the request to the firmware
  //
  Error = Diag_Send_Tx_Packet(&TxPacket, PACKET_SIZE);

  if (Error)
   handle-error;

  //
  // Receive the response.
  //
  Error = GetResponse(NVTL_CMD::DIAG_CALL_END);

  if (Error)
   handle-error;

  if (Bad response format)
   handle-error;

  //
  // Clean up and return
  //
  post-send-serial-post-cleanup;
  free-memory(response-buf);

  release-lock;

  DebugPrint(severity, "Diag_Call_End: "
   "End: RetVal: %d\\n", return-status);
  return success;
}

It’s a good thing the debug prints were there, as there isn’t really anything to go on besides them. All this function does, from the perspective of the code in the DLL, is simply set up a (very simple) request packet, send it to the firmware, receive the response, and return to the caller. This same structure is shared by most of the other Diag_* functions in the module which communicate to the firmware; in general, all those functions do is translate C arguments into the over-the-wire protocol, call the functions to send the packet and wait for a response, and then unpackage the response data back into return data for the C caller (if applicable). The firmware is responsible for doing all the real work behind the requests. Putting it another way, think of the SDK functions embedded in the WMC module as RPC stubs, the driver that creates the virtual serial port as the RPC runtime library, and the firmware on the card as the RPC server (although the whole protocol and data repackaging process is far simpler than RPC).

Now, because most (really, all) of the logic for implementing particular requests resides in the device firmware on the card, the actual implementation is for the most part a “black box” – we can see the interface (and sometimes have examples for how it is called, if a certain SDK function is actually used), but we can’t really see what a particular request will do, other than observe side effects that the client code calling that function (if any) appears to depend upon.

Normally, that’s a pretty unpleasant situation to be in from a reversing standpoint, but the debug prints at least give us a fighting chance here. Thanks to them, as opposed to an array of un-named functions that send different unknown bitpatterns to an opaque firmware interface, at least we know what a particular firmware call wrapper function is ostensibly supposed to do (assuming the name isn’t too cryptic – we’ll probably never know what some things like “nw_nw_dtc_sms_so_get” actually refer to exactly).

Back to the problem at hand, however. After analyzing Diag_Call_End, it’s pretty clear that the function doesn’t take any arguments and simply returns an error code (or success) indicator to the caller. All of the global state depended upon by the function is the “standard stuff” that is shared by anything using the firmware comms interface, including functions that we can observe being called indirectly by the connection manager app, so it’s a good bet that we should be able to just call the function and see what happens.

However, there’s still the minor snag relating to the fact that Diag_Call_End isn’t exported from WmcV740.dll. There are a couple of different approaches that we could take to try and solve this problem, with varying degrees of complexity, depending on our requirements. For example, in an attempt to provide some level of automatic compatibility with future (or previous) releases, we might implement some kind of code fingerprinting that could be used to scan code in the DLL to look for the start of this particular function. In this instance, however, I decided it wasn’t really worth the trouble; for one, WmcV740.dll is fairly well self-contained and doesn’t depend on anything other than the driver to set up the virtual serial port (and the device, of course), and from examining debug prints in the DLL, it became clear that it was designed to support multiple firmware revisions (and even multiple devices). Given this, it seemed an acceptable limitation to tie a program to this particular version of WmcV740.dll and trust that it will remain backwards/forwards compatible enough with any device firmware updates I apply (if any). Because the DLL is self-contained, the connection manager software could even conceivibly be updated after placing a copy of the DLL in a different location, since it isn’t tied into the rest of the connection manager software in any meaningful way.

As a result of these factors, I settled on just hardcoding offsets from the module base address to the start of the function in question that I wanted to call. Ugly, yes, but in this particular instance, it seemed like the most reasonable compromise. Recall that in Win32, the HMODULE value returned by LoadLibrary is really the base address of a given module, making it trivially easy to locate a loaded module base address in-memory. From there, it was just a matter of adding the offsets to the module base to form complete pointer values, casting these to function points, and making the call.

After all of that, all that’s left is to try the function out. This involves loading the WMC module, calling a standard export, WMC_Startup to initialize it, and then just making the call to the non-exported Diag_Call_End.

As luck would have it, the function call did exactly what I had hoped – it caused the device to enter dormant mode if there was an active data session. The next time link activity occured, the call was picked back up (and if the call had failed over to 1xRTT and an EVDO link could be re-acquired, the call would be automatically upgraded back to EVDO). Not quite as direct as simply commanding the card to re-scan for EVDO, but it did get the job done, if in a slightly round-about fashion.

From there, all that remained was to add an automated component to this – periodically ask the card whether it was in 1xRTT or EVDO mode, and if the latter, push the metaphorical “end call” button every so often to try and coax the card into switching over to EVDO. This information is readily available via the standard WMC abstraction layer (which was fairly well understood at this point), albeit with a caveat: The card appears to not even try to scan for an EVDO link after it has failed over to 1xRTT (or if it does, it doesn’t make this fact known to anything on the other end of the firmware comms interface as far as I could tell), meaning that it’s not easy to distinguish between the device being in 1xRTT mode due to there really being no EVDO coverage locally, period, or because you went under a bridge/into an elevator/whatever for a moment and temporarily lost signal, and the device picked the wrong network up when it re-acquired signal.

Still, all things considered, the solution is workable (if a major hack in terms of architecture). For those in a similar predicament, I’ve posted the program that I wrote to periodically try to re-acquire an EVDO link based on the information I arrived at while working on this series. It’s a console app that will display basic signal strength statistics over time, and will (as previously mentioned) automatically place the device into dormant mode every so often while you’re on 1xRTT, in an attempt to re-acquire EVDO access after a signal loss event. To use it, you’ll need the VC2005 SP1 CRT installed, and you’ll also need WmcV740.dll version 1.0.6.6 (exact match required for the dormant mode functionality to operate), which comes with the current version of VZAccess Manager for the V740 for Windows Vista (at the time of this writing, that’s 6.1.8). Other versions may work if they include the exact same version of WmcV740.dll. You’ll need to place WmcV740.dll in the same directory as wwanutil.exe for it to function, or it’ll bail out when it can’t load the module. Also, only one program can talk to the V740’s firmware communication port at a time, which means that while you are running wwanutil, you can’t run VZAccess (or any other program that tries to talk to the V740’s firmware communication port – if you try to start wwanutil while VZAccess is using the card, you’ll get error 65, and likewise, if you try to start VZAccess while wwanutil is running, VZAccess will complain that something else is using the device). You can still dial the connection manually via Windows DUN, however – the “AT” modem port is unaffected.

Of course, software considerations aside, you’ll also need a V740 (otherwise known as a Merlin X720) ExpressCard as well, with a corresponding service provider plan. (As far as I can tell, the Sprint and Verizon Novatel Rev.A ExpressCards are all rebranded Novatel Merlin X720’s and should be functionally identical, but as I am not a Sprint customer, I can’t test that.) Theoretically, the WmcV740 module supports other Novatel devices, but I haven’t tested that either (I suspect that the protocol used to talk to the firmware is actually a generic Qualcomm chipset diagnostics protocol that may function across other manufacturers – it sure seems to be very similar to the protocol that BitPim uses to talk to many Qualcomm phones, for instance – but the Wmc module will only detect Novatel devices). Also, given that the program is calling undocumented functions in the device’s firmware control interface, I’d recommend against trying it out on every single device you can get your hands on, just to be on the safe side. Although the module is theoretically smart enough to detect whether it’s really talking to a Novatel device of a sufficiently high firmware/protocol revision, or something else, I can’t help you if you somehow manage to brick your card with it (though I don’t see how you’d possibly do that with the program, just covering all the bases…). The usual disclaimers apply: no warranty provided (this program is provided “as-is”), and I can’t provide support for your device or add support for (insert X random other device here).

If you hit Alt-2 while the wwanutil console window is up, you’ll get some statistics akin to the field test mode available in VZAccess manager, although I can’t guarantee that the FTM option in VZAccess was actually accurate (or tell you how to interpret many of the fields). Since the verbose display is based on the same information as the connection manager GUI, it is probably just as accurate (or inaccurate) as the normal RTM display, though perhaps in a more readable format. Alt-3 will also display a log of recent connection events (Alt-1 to return to the main screen), and you can use the Ctrl-D keystroke combination at any of the screens to manually force the device into dormant mode (though it may immediately pick back up into active mode if there is link activity, just as if you hit “end call” on a tethered handset and the link was still active).

With a workable solution for my original predicament found, this wraps up the V740 series (at least for now…). Hopefully, at some point, support for things like periodically auto-reacquiring EVDO might find itself into the stock connection manager software, but for now, this will have to do.