Archive for the ‘Security’ Category

The No-Execute hall of shame…

Friday, June 15th, 2007

One of the things that I do when I set up Windows on a new (modern) box that I am going to use for more than just temporary testing is to enable no-execute (“NX”) support by default. Preferable, I use NX in “always on” mode, but sometimes I use “opt-out” mode if I have to. It is possible to bypass NX in opt-out (or opt-in) mode with a “ret2libc”-style attack, which diminishes the security gain in an unfortunately non-trivial way in many cases. For those unclear on what “alwayson”, “optin”, “optout”, and “alwaysoff” mean in terms of Windows NX support, they describe how NX is applied across processes. Alwayson and alwaysoff are fairly straightforward; they mean that all processes either have NX forced or forced off, unconditionally. Opt-in mode means that only programs that mark themselves as NX-aware have NX enabled (or those that the administrator has configured in Control Panel), and opt-out mode means that all programs have NX applied unless the administrator has explicitly excluded them in Control Panel.

(Incidentally, the reason why the Windows implementation of NX was vulnerable to a ret2libc style attack in the first place was for support for extremely poorly written copy protection schemes, of all things. Specifically, there is code baked into the loader in NTDLL to detect SecuROM and SafeDisc modules being loaded on-the-fly, for purposes of automagically turning off NX for these security-challenged copy protection mechanisms. As a result of this, exploit code can effectively do the same thing that the user mode loader does in order to disable NX, before returning to the actual exploit code (see the Uninformed article for more details on how this works “under the hood”). I really just love how end user security is compromised for DRM/copy protection mechanisms as a result of that. Such is the price of backwards compatibility with shoddy code, and the myriad of games using such poorly written copy protection systems…)

As a result of this consequence of “opt-out” mode, I would recommend forcing NX to being always on, if possible (as previously mentioned). To do this, you typically need to edit boot.ini to enable always on mode (/noexecute=alwayson). On systems that use BCD boot databases, e.g. Vista or later, you can use this command to achieve the same effect as editing boot.ini on downlevel systems:

bcdedit /set {current} nx alwayson

However, I’ve found that you can’t always do that, especially on client systems, depending on what applications you run. There are still a lot of things out there which break with NX enabled, unfortunately, and “alwayson” mode means that you can’t exempt applications from NX. Unfortunately, even relatively new programs sometimes fall into this category. Here’s a list of some of the things I’ve ran into that blow up with NX turned on by default; the “No-Execute Hall of Shame”, as I like to call it, since not supporting NX is definitely very lame from a security perspective – even more so, since it requires you to switch from “Alwayson” to “Optout” (so you can use the NX exclusion list in Control Panel) which is in an of itself a security risk even for unrelated programs that do have NX enabled by default). This is hardly an exclusive list, but just some things that I have had to work around in my own personal experience.

  1. Sun’s Java plug-in for Internet Explorer. (Note that to enable NX for IE if you are in “Optout” mode, you need to go grovel around in IE’s security options, as IE disables NX for itself by default if it can – it opts out, in other words). Even with the latest version of Java (“Java Platform 6 update 1”, which also seems to be be known as 1.6.0), it’s “bombs away” for IE as soon as you try to go to a Java-enabled webpage if you have Sun’s Java plugin installed.

    Personally, I think that is pretty inexcusable; it’s not like Sun is new to JIT code generation or anything (or that Java is anything like the “new kid on the block”), and it’s been the recommended practice for a very long time to ensure that code you are going to execute is marked executable. It’s even been enforced by hardware for quite some time now on x86. Furthermore, IE (and Java) are absolutely “high priority” targets for exploits on end users (arguably, IE exploits on unpatched systems are one of the most common delivery mechanisms for malware), and preventing IE from running with NX is clearly not a good thing at all from a security standpoint. Here’s what you’ll see if you try to use Java for IE with NX enabled (which takes a bit of work, as previously noted, in the default configuration on client systems):

    (106c.1074): Access violation - code c0000005 (first chance)
    First chance exceptions are reported before any
    exception handling.
    This exception may be expected and handled.
    eax=6d4a4a79 ebx=00070c5c ecx=0358ea98
    edx=76f10f34 esi=0ab65b0c edi=04da3100
    eip=04da3100 esp=0358ead4 ebp=0358eb1c
    iopl=0         nv up ei pl nz na pe nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b
    gs=0000             efl=00210206
    04da3100 c74424040c5bb60a mov dword ptr [esp+4],0AB65B0Ch
    0:005> k
    ChildEBP RetAddr  
    WARNING: Frame IP not in any known module. Following
    frames may be wrong.
    0358ead0 6d4a4abd 0x4da3100
    0358eb1c 76e31ae8 jpiexp+0x4abd
    0358eb94 76e31c03 USER32!UserCallWinProcCheckWow+0x14b
    [...]
    0:005> !vprot @eip
    BaseAddress:       04da3000
    AllocationBase:    04c90000
    AllocationProtect: 00000004  PAGE_READWRITE
    RegionSize:        000ed000
    State:             00001000  MEM_COMMIT
    Protect:           00000004  PAGE_READWRITE
    Type:              00020000  MEM_PRIVATE
    0:005> .exr -1
    ExceptionAddress: 04da3100
       ExceptionCode: c0000005 (Access violation)
      ExceptionFlags: 00000000
    NumberParameters: 2
       Parameter[0]: 00000008
       Parameter[1]: 04da3100
    Attempt to execute non-executable address 04da3100
    0:005> lmvm jpiexp
    start    end        module name
    [...]
        CompanyName:  JavaSoft / Sun Microsystems
        ProductName:  JavaSoft / Sun Microsystems
                      -- Java(TM) Plug-in
        InternalName: Java Plug-in für Internet Explorer

    (Yes, the internal name is in German. No, I didn’t install a German-localized build, either. Perhaps whoever makes the public Java for IE releases lives in Germany.)

  2. NVIDIA’s Vista 3D drivers. From looking around in a process doing Direct3D work in the debugger on an NVIDIA system, you can find sections of memory laying around that are PAGE_EXECUTE_READWRITE, or writable and executable, and appear to contain dynamically generated SSE code. Leaving memory around like this diminishes the value of NX, as it provides avenues of attack for exploits that need to store some code somewhere and then execute it. (Fortunately, technologies like ASLR may make it difficult for exploit code to land in such dynamically allocated executable and writable zones in one shot. An example of this sort of code is as follows (observed in a World of Warcraft process):
    0:000> !vprot 15a21a42 
    BaseAddress:       0000000015a20000
    AllocationBase:    0000000015a20000
    AllocationProtect: 00000040  PAGE_EXECUTE_READWRITE
    RegionSize:        0000000000002000
    State:             00001000  MEM_COMMIT
    Protect:           00000040  PAGE_EXECUTE_READWRITE
    Type:              00020000  MEM_PRIVATE
    0:000> u 15a21a42
    15a21a42 8d6c9500       lea     ebp,[rbp+rdx*4]
    15a21a46 0f284d00       movaps  xmm1,xmmword ptr [rbp]
    15a21a4a 660f72f108     pslld   xmm1,8
    15a21a4f 660f72d118     psrld   xmm1,18h
    15a21a54 0f5bc9         cvtdq2ps xmm1,xmm1
    15a21a57 0f590dd0400b05 mulps   xmm1,xmmword ptr
               [nvd3dum!NvDiagUmdCommand+0xd6f30 (050b40d0)]
    15a21a5e 0f59c1         mulps   xmm0,xmm1
    15a21a61 0f58d0         addps   xmm2,xmm0
    0:000> lmvm nvd3dum
    start             end                 module name
    04de0000 0529f000   nvd3dum
        Loaded symbol image file: nvd3dum.dll
    [...]
        CompanyName:      NVIDIA Corporation
        ProductName:      NVIDIA Windows Vista WDDM driver
        InternalName:     NVD3DUM
        OriginalFilename: NVD3DUM.DLL
        ProductVersion:   7.15.11.5828
        FileVersion:      7.15.11.5828
        FileDescription:  NVIDIA Compatible Vista WDDM
              D3D Driver, Version 158.28 

    Ideally, this sort of JIT’d code would be reprotected to be readonly after it is generated. While not as severe as leaving the entire process without NX, any “NX holes” like this are to be avoided when possible. (Simply slapping a “PAGE_EXECUTE_READWRITE” on all of your allocations and calling it done is not the proper solution, and compromises security to a certain extent, albeit significantly less so than disabling NX entirely.)

  3. id Software’s Quake 3 crashes out immediately if you have NX enabled. I suppose it can be given a bit of slack given its age, but the SDK documentation has always said that you need to protect executable regions as executable.

Programs that have been “recently rescued” from the No-Execute hall of shame include:

  1. DosBox, the DOS emulator (great for running those old games on new 64-bit computers, or even on 32-bit computers where DosBox has superior support for hardware, such as sound cards, compared to NTVDM. The current release version (0.70) of DosBox doesn’t work with NX if you have the dynamic code generation core enabled. However, after speaking to one of the developers, it turns out that a version that properly protects executable memory was already in the works (in fact, the DosBox team was kind enough to give me a prerelease build with the fix in it, in leui of my trying to get a DosBox build environment working). Now I’m free to indulge in those oldie-but-goodie classic DOS titles like “Master of Magic” from time to time without having to mess around with excluding DosBox.exe from the NX policy.
  2. Blizzard Entertainment’s World of Warcraft used to crash immediately on logging on to the game world if you had NX enabled. It seems as if this has been fixed for the most recent patch level, though.
  3. Another of Blizzard’s games, Starcraft, will crash with NX enabled unless you’ve got a recent patch level installed. The reason is that Starcraft’s rendering engine JITs various rendering operations into native code on the fly for increased performance. Until relatively recently in Starcraft’s lifetime, none of this code was NX-aware and blithely outputted non-executable rendering code which it then tried to run. In fact, Blizzard’s Warcraft II: Battle.net Edition has not been patched to repair this deficiency and is thus completely unusable with NX enabled as a result of the same underlying problem.

If you’re writing a program that generates code on the fly, please use the proper protection attributes for it – PAGE_READWRITE during generation, and PAGE_EXECUTE_READONLY after the code is ready to use. Windows Server already ships with NX enabled by default (although in “opt-out” mode), and it’s already a matter of time before client SKUs of Windows do the same.

Process-level security is not a particularly great way to enforce DRM when users own their own hardware.

Tuesday, May 8th, 2007

Recently, I discussed the basics of the new “process-level security” mechanism introduced with Windows Vista (integrity levels; otherwise known as “mandatory integrity control“, or MIC for short).

Although when combined with more conventional user-level access control, there is the potential to improve security for users to an extent, MIC is ultimately not a mechanism to lock users out of their own computers.

As you might have guessed by this point, I am speaking of the rather less savory topic of DRM. MIC might appear to be attractive to developers that wish to deploy a DRM system, but it really doesn’t provide a particularly effective way to stop a computer owner (administrator) from, well, administering their system.

MIC (and process-level security), on the surface, may appear to be a good way to accomplish this goal. After all, the process-level security model does allow for securable objects (such as processes) to be guarded against other objects – even of the same user sid, which is typically the kind of restriction that a software-based DRM system will try to enforce (i.e. preventing you from debugging a program).

However, it is important to consider that the sort of restrictions imposed by process-level security mechanisms are designed to protect programs from other programs. They are not supposed to protect programs from the user that controls the computer on which they run (in other words, the computer administrator or whatever you wish to call it).

Windows Vista attempts to implement such a (DRM) protection scheme, loosely based on the principals of process-level security, in the form of something called “protected processes“.

If you look through the Vista SDK headers (specifically, winnt.h), you may come across a particularly telling comment that would seem to indicate that protected processes were originally intended to be implemented via the MIC scheme for process-level security in Vista:

#define SECURITY_MANDATORY_LABEL_AUTHORITY       {0,0,0,0,0,16}
#define SECURITY_MANDATORY_UNTRUSTED_RID         (0x00000000L)
#define SECURITY_MANDATORY_LOW_RID               (0x00001000L)
#define SECURITY_MANDATORY_MEDIUM_RID            (0x00002000L)
#define SECURITY_MANDATORY_HIGH_RID              (0x00003000L)
#define SECURITY_MANDATORY_SYSTEM_RID            (0x00004000L)
#define SECURITY_MANDATORY_PROTECTED_PROCESS_RID (0x00005000L)

//
// SECURITY_MANDATORY_MAXIMUM_USER_RID is the highest RID
// that can be set by a usermode caller.
//

#define SECURITY_MANDATORY_MAXIMUM_USER_RID \\
   SECURITY_MANDATORY_SYSTEM_RID

As it would turn out, protected processes (as they are called) are not actually implemented using the integrity level/MIC mechanism on Vista; instead, there is another, alternate mechanism that provides a way to mark protected processes are “untouchable” by “normal” processes (the lack of flexibility in the integrity level ACE system, as far as specifying which access rights are permitted, is the likely reason. If you read the linked article and the paper it includes, there are a new set of access rights defined specially for dealing with protected processes, which are deemed “safe”. These access rights are requestable for such processes, unlike the standard access rights, and there isn’t a good way to convey this with the set of “allow/deny read/write/execute” options available with an integrity level ACE on Vista.)

The end result is however, for the most part, the same; “protected processes” are essentially to high integrity (or lower) processes as high (or medium) integrity processes are to low integrity processes; that is, they cannot be adversely affected by a lesser-trusted process.

This is where the system begins to break down, though. Process integrity is an interesting way to attempt to curtail malware and exploits because the human at the computer (presumably) does not wish such activity to occur. On the other hand, DRM attempts to prevent the human at their computer from performing an action that they (ostensibly) do in fact wish to perform, with their own computer.

This is a fundamental distinction. The difference is that the malware or exploit code that process level security is designed to defend against doesn’t have the benefit of a human with physical (or administrative) access to the computer in question. That little detail turns out to make a world of difference, as we humans aren’t necessarily constrained by the security system like a program would be. For instance, if some evil exploit code running as a low integrity process on a computer wants to gain administrative access to the box, it just can’t do so (excepting the possibility of local privilege escalation exploits or trying to social-engineer the user into giving the program said access – for the moment, ignore those attack vectors, though they are certainly real ones that must be dealt with at some point).

However, if I am a human sitting at my computer, and I am logged on as a “plain user” and wish to perform an administrative task, I am not so constrained. Instead, I simply either log out and log back in as an administrative user (using my administrative account password), or type my password into an elevation prompt. Problem solved!

Now, of course, the protected process mechanism in Vista isn’t quite that dumb. It does try to block administrators from gaining access to protected processes; direct attempts will return STATUS_ACCESS_DENIED. However, again, humans can be a bit more clever here. For one, a user (and by user, I mean a person with full control over their computer) that is intent on bypassing the protected process mechanism could simply load a driver designed to subvert the protected process mechanism.

The DRM system might then counter that attack by then requiring kernel mode code to be signed, on the theory that for wide-scale violations of the DRM system in such a manner, a “cracker” would need to obtain a code-signing cert that would make them more-easily identifiable and vulnerable to legal attack.

However, people are clever (and more specifically, people with physical / administrative access to a computer are not so necessarily constrained by the basic “rules” of the operating system). One could imagine somebody doing something like patching out the driver signing checks on disk, or any number of other approaches. The theoretical counters to attacks like that would be some sort of hardware support to verify the boot process and ensure that only trusted, signed (and thus unmodified by a “cracker”) code can boot the system. Even that is not necessarily foolproof, though; what’s to say that nobody has compromised the task-offload engine on the system’s NIC to run custom code with full physical memory access, outside the confines of the operating system entirely? Free reign over something capable of performing DMA to physical memory means that kernel code and data can be freely rewritten.

Now, where am I going with all of this? I suppose that I am just frustrated that certain people seem to want to continue to invest significant resources into systems that try to wrest control of a computer from an end user, which are simply doomed to fail by the very nature of the diverse and uncontrolled systems upon which that code will run (and which sometimes compromise the security of customer systems in the process). I don’t think the people behind the protected processes system at Microsoft are stupid, not by any means. However, I can’t help but feel that they have know they’re fighting a losing battle, and that their knowledge and expertise would be better spent on more productive things (like working to improve the next release of Windows, or what-have-you).

Now, a couple of parting shots in an effort to quell several potential misconceptions before they begin:

  • I am not advocating that people bypass DRM. This is probably less than legal in many places. I am, however, trying to make a case for the fact that trying to use security models originally designed to protect users from malware as a DRM mechanism is at best a bad idea.
  • I’m also not trying to downplay the negative impact of theft of copyrighted materials, or anything of that sort. As a programmer myself, I’m well aware that if nobody will buy your product because it’s pirated all over the world, then it’s hard to eke out a living. However, I do believe that it is a fallacy to say that it’s impossible to make money out of software or content in the Internet age without layer after layer of customer-unfriendly DRM.
  • I’m not trying to knock the rest of the improvements in Vista (or the start of process-level security being deployed to joe end user, even though it’s probably not yet perfect). There’s a lot of good work that’s been done with Vista, and despite the (ill-conceived, some might say) DRM mechanisms, there is real value that has been added with this release.
  • I’m also not trying to say that Microsoft is devoting so much of its time to DRM that it isn’t paying any attention to adding real value to its products. However, in my view – most of the time spent on DRM is time that could be better spent adding that “real value” instead of doing the dance of security by obscurity (as with today’s systems, that is really all you can do, when it comes down to it) with some enigmatic idea of a “cracker” out there intent on stealing every piece of software or content they get their hands on and redistributing it to every person in the world for free.
  • I’m also not trying to state that the kernel mode code signing requirements for x64 Vista are entirely motivated by DRM (or that all it’s good for is an attempt to enforce DRM), but I doubt that anyone could truthfully say that DRM played no part in the decision to require signed drivers on x64 Vista either. Regardless, there remain other reasons for ostensibly requiring signed code besides trying to block (or at least hold accountable) attempts to bypass the protected process system.

A brief discussion of Windows Vista’s IE Protected Mode (and user/process level security)

Wednesday, April 25th, 2007

I was discussing the recent QuickTime bug on Matasano Chargen, and the question of whether it would work in the presence IE7 + Vista and protected mode came up. I figured a more in depth explanation as to just what IE7’s Protected Mode actually does might be in order, hence this posting.

One of the new features introduced with Internet Explorer 7 on Windows Vista is something called “Protected Mode” (or “IE Protected Mode”). It’s an on-by-default security that is sold as something that greatly hardens Internet Explorer against meaningful exploitation, even if an exploitable hole in IE (or a component of IE, such as an ActiveX control) is found.

Let’s dig in a little bit deeper as to what IE Protected Mode is (and isn’t), and what it means for you.

First things first. Protected mode is not related to the “enhanced security configuration” that is introduced in Windows Server 2003 in any way. The “enhanced security configuration” IE included with Srv03 is, at its core, just a set of more restrictive (i.e. locked down) default settings with regard to things like scripting, downloading files, and soforth. Protected mode does not rely on locking down security zone settings to the point where you cannot download files or run any scripts by default, and is completely unrelated to the IE hardening work done in the default Srv03 configuration. I’d imagine that protected mode will probably be included in Longhon Server, but the underlying technologies are very different, and are designed to address different “market segments” (“enhanced security configuration” being just a set of more restrictive defaults, whereas protected mode is a fundamental rethink of how the browser interacts with the rest of the operating system).

Protected mode is a feature that is designed to make “surfing the web a safer experience” for end users. Unlike Srv03, where a locked down IE might fly because you are ostensibly not supposed to be doing lots of fancy web-browser-ish things from a server box, end users are clearly not going to take kindly towards not being permitted to download files, run JavaScript, and soforth in the default configuration.

The way protected mode takes a stab at making things better for the end users of the world is to build upon the new “integrity level” security mechanism that has been introduced into the Windows NT security model starting with Vista, with the goal of making the web browser an “untrusted” process that cannot perform “dangerous” things.

To understand what this means, it’s necessary to know what these new-fangled “integrity levels” in Vista are all about. Integrity levels are assigned to a token representing a user, and tokens are assigned to a process (and can be impersonated by a thread, typically something done by something like an IPC server process that needs to perform something on behalf of a lesser-privileged caller). What’s meaningful about integrity levels is that they allow you to partition what we know of as a “user” into something with multiple different “trust levels” (low, medium, high, with several other infrequently-used levels), such that a thread or a process running as a certain integrity level (or “trust level”) cannot “intefere” with something running at a higher integrity level.

The way this is implemented is by an additional level of security check that is performed when some kind of access rights check is performed. This additional check compares the integrity level of the caller (i.e. the thread or process token’s integrity level) with a new type of field in the security descriptor of the target object (called the “mandatory label“) that specifies what sorts of access a caller of a certain integrity level is allowed to request. The “mandatory label” allows an integrity level to be associated with an object for security checks, and allows three basic policies (lower integrity levels cannot request read access, low integrity levels cannot request write access, lower integrity levels cannot request execute access) to be set, comparing the integrity level of a caller against the integrity level specified with an object’s security descriptor. (Only these three generic access rights may be “guarded” by the integrity level in this way; there is no granularity to allow object specific access rights to be given specific minimum caller integrity levels).

The default settings in most places do not allow write access to be granted to processes of a lower integrity level, and the default minimum integrity level is usually “medium”. The new, label/integrity level access check is performed before conventional ACL-based checks.

In this respect, integrity levels are an attempt to inject something of a sort of process-level security into the NT security model.

If you’re at all familiar with how NT security works, this may be a bit new to you. NT is based upon user-level security, where processes (and threads, in the case of impersonation) run under the context of a user, and derive their security rights (i.e. what securable objects they have access to – files, directories, registry keys, and soforth) and privileges (i.e. the ability to shut down the system, the ability to load a driver, the ability to bypass ACL checks for backup/restore, and soforth) from the user context they run under. The thinking behind this sort of model is that each distinct user on a system will run as, well, a different user. Processes from one user cannot interfere with processes (or files, directories, and soforth) running as a different user, without being granted access to do so (i.e. via an ACL, or by special, administrator-level privileges). The “operating system” (i.e. the services and programs that support the system) conceptually runs as yet another “user”, and is thus ostensibly protected from adverse modifications by malicious users on the system. Each user thus exists in a sort of sandbox, unable to interfere with any other user. Conversely, any process running as a particular user can do anything to any other process (or file or directory) owned by that same user; there is no protection within a user security context.

Obviously, this is a gross oversimplification of the NT security model, but it gets the point across (or so I hope!): The security system in NT revolves around the user as the means to control access in a meaningful fashion. This does make sense in environments like large corporate settings, where many users share the same computer (or where computers are centrally managed), such that users cannot interfere with eachother, and ostensibly cannot attack their computers (i.e. the operating system) because they are running as “plain users” without administrator access and cannot perform “dangerous” tasks.

Unfortunately, in the era of the internet, exploitable software bugs, and computers with end users that run code they do not entirely trust, this model isn’t quite as good as we would like. Because the user is the security boundary, here, if an attacker can run code under your user account, they have full access to all of the processes, files, directories (and soforth) that are accessible to that user. And if that user account happened to be a computer administrator account, then things are even worse; now the attacker has free reign over the entire computer, and everything on it (including all other users present on the box).

Clearly, this isn’t such a great situation, especially given the reality that many users run untrusted code (or more generally, buggy or exploitable code) on a frequent basis. In this Internet-enabled age, user-level security as it has been traditionally implemented isn’t really enough.

There are still ways to make things “work” with user-level security; namely, to give each human several user accounts, specific to the task that they are doing. For example, I might have one user account that I use for browsing and games, and another user account that I use for accessing top secret corporate documents. If the user account that I use to browse the Internet with gets compromised somehow, such as by my running an exploitable program and getting “owned”, then my top secret corporate documents are still safe; the malicious code running under the Internet-browsing-and-games account doesn’t have access to do anything to my secret documents, since they are owned by a different account and the default ACL protects them from other users.

Of course, this is a tough pill to expect end users to swallow; having to switch user accounts as they switch between tasks of differing importance is at best inconvenient and at worst confusing and problematic (for example, if I want to download a file from the Internet for use with my top secret corporate documents, I have to go to (comparatively) a lot of trouble to give it to my other user, and doing so opens an implicit trust relationship between my secret-documents-user and my less-trusted-Internet-browsing user, that the program I just downloaded is 1) not inherently malicious, 2) not tampered with or compromised, and 3) not full of exploitable holes that would put my documents at risk anyway the moment my secret-documents-user runs it). Clearly, while you could theoretically still get by with user level access in today’s world, as a single user, doing so as it is implemented in Windows today is a major pain (and even with everyone’s best intentions, few people I have seen really follow through completely with the concept and do not share programs or files between their users in any way whatsoever).

(Note that I am not suggesting that things like running as nonadmin or breaking tasks up into different users are a lost cause, just that to get things truly right and secure, it is a much more difficult job than one might expect initially, so much so that most “joe users” will not stand a chance at doing it perfectly. I’m also not trying to knock on user-level security as just outright flawed, useless, or broken, but the fact remains there are problems in today’s world that merit additonal consideration.)

Whew, that’s a rather long segway into user-level security. Anyways, protected mode is Microsoft’s first real attempt to tackle this problem – the fact that user level security does not always provide fine enough granularity, in the fact of untrusted or buggy programs – in a consumer-level system, in such a way that is palatable to “joe users”. The way that it works is to leverage the integrity level mechanism to create an additonal security barrier between the user’s web browser (i.e. Internet Explorer in protected mode) and the rest of the user’s files and programs. This is done by assigning the IE process a low integrity level. Following with what we know of integrity levels above, this means that the IE process will be denied access (by the security manager in the kernel) to do things like overwrite your documents, place malicious programs in your “startup” Start Menu directory, overwrite executables in your system directory (of course, if you were already running as a plain user, it wouldn’t be able to do this anyway…), and soforth. This is clearly a good thing. In case the implications haven’t fully sunk in yet:

If an attacker compromises a low integrity process, they should not be able to destroy your data or install a trojan (or other malicious code) on your system*.

(*: This is, of course, barring implementation errors, design oversights, and local privilege escalation holes. The latter may prove to be an especially important sticking point, as many companies (Microsoft included) have often “looked down” upon local privilege escalation bugs as relatively not important to fix in a timely fashion. Perhaps the introduction of process-level security control will help add impetus to shatter the idea that leaving local privilege escalation holes sitting around is okay.)

Now, this is a very important departure from where we have been traditionally with user level access control. Combining per process access control with per user access control allows us to do a lot more to protect users from malicious code and buggy software (in other words, protecting users from themselves), in a fashion that is much easier to deal with from a user perspective.

However, I think it would be premature to say that we’re “all the way there” yet. Protected mode and low integrity level processes are definitely a great step in the right direction, but there still remain issues to be solved. For example, as I alluded to previously, the default configuration allows medium integrity objects to still be opened for read access by low integrity callers. This means that, for example, if an attacker compromises an IE process running in protected mode, they still do have a chance at doing some damage. For instance, even though an attacker might not be able to destroy your data, per-se, he or she can still read it (and thus steal it). So, to continue with our previous example of someone who works with top secret corporate documents, an attacker might not be able to wipe out the company financial records, or joe user’s credit card numbers, but he or she could still steal them (and post them on the Internet for anyone to see, or what have you). In other words, an attacker who compromises a low integrity process can’t destroy all your data (as would be the case if there were no process-level security and we were dealing with just one user account), but he or she can still read it and steal it.

There are other things to watch out for, too, with protected mode. Don’t get into the habit of clicking “OK” on that “are you sure you want this program to run outside of IE Protected Mode” dialog box, or you’re setting yourself up to be burned by clever malware. And certainly never click the “don’t ask me again” check box on the consent dialog, or you’re just begging for some piece of malware to abuse your implicit consent without you even realizing that something’s gone wrong. (In case you’re wondering, the mechanism in IE that allows processes to elevate to medium integrity involves an appcompat hook on CreateProcess that requests a medium integrity process (ieuser.exe) to prompt the user for consent, with the medium integrity process creating the process if the user agrees. So user interaction is still required there, though we know how much users love to click “OK” on all those pesky security warnings. Oh, and there is also some hardening that has been done in win32k.sys to prevent lower integrity processes from sending “dangerous” window messages to higher integrity processes (even WM_USER and friends are disabled by default across an integrity level boundary), so “shatter attacks” ought not to work against the consent dialog. Note that if you bypass the appcompat hook, the new process created is also low integrity, and won’t be able to write to anywhere outside of the “low integrity” sandbox writable files and directories.)

So, while running IE in protected mode does, in some respects limit the damage that can be done if you get compromised, I would still recommend not running untrusted programs under the same user account as your important documents (if you really care that much). Perhaps in a future release, we’ll see a solution that addresses the idea of not allowing untrusted programs to read arbitrary user data as well (such should be possible with proper use of the new integrity level mechanisms, although I suspect the true difficulty shall be in getting third party applications to play nicely as we continue to try and place control of the user’s documents more firmly in the control of the actual user instead of in any arbitrary application that runs on the box).

Thoughts on PatchGuard (otherwise known as Kernel Patch Protection)

Monday, January 29th, 2007

Recently, there has been a fair bit of press about PatchGuard. I’d like to clarify a couple of things (and clear up some common misconceptions that appear to be floating around out there).

First of all, these opinions are my own, based on the information that I have available to me at this time, and are not sponsored by either Microsoft or my employer. Furthermore, these views are based on PatchGuard as it is implemented today, and do not relate to any theoretical extensios to PatchGuard that may occur sometime in the future. That being said, here’s some of what I think about PatchGuard:

PatchGuard is (mostly) not a security system.

Although some people out there might try to tell you this, I don’t buy it. The thing about PatchGuard is that it protects the kernel (and a couple of other Microsoft-supplied core kernel libraries) from being patched. PatchGuard also protects a couple of kernel-related processor registers (MSRs) that are used in conjunction with functionality like making system calls. However, this doesn’t really equate to improving computer security directly. Some persons out there would like to claim that PatchGuard is the next great thing in the anti-malware/anti-rootkit war, but they’re pretty much just wrong, and here’s why:

  1. Malware doesn’t need to patch the kernel in the vast majority of all cases. Virtually all of the “interesting” things out there that malware does on compromised computers (create botnets, blast out spam emails, and the like) don’t require anything that is even remotely related to what PatchGuard blocks in terms of kernel patch prevention. Even in the rootkit case, most of the hiding of nefarious happenings could be done with clever uses of documented APIs (such as creating threads in other processes, or filesystem filters) without even having to patch the kernel at all. I will admit that many rootkits out there now do simply patch the kernel, but I attribute this mostly to a 1) lack of knowledge about how the kernel works by the part of rootkit authors, and 2) the fact that it might be “easier” to simply patch the kernel and introduce race conditions that crash rootkit’d computers rarely than to do things the right way. Once things like system call hooks in rootkits start to run afoul of PatchGuard, though, rootkit authors have innumerable other choices that are completely unguarded by PatchGuard. As a result, it’s not really correct (in my opinion) to call PatchGuard an anti-malware technology.
  2. Malware authors are more agile than Microsoft. What I mean when I say “more agile” is that malware vendors can much more easily release new software versions without the “burdens” of regression testing, quality assurance, and the like. After all, if you’re already in the malicious software business, it probably doesn’t matter if 5% of your customers (sorry, victims) will have systems that crash with your software installed because you didn’t fully test your releases and fix a corner case bug. On the other hand, Microsoft does have to worry about this sort of thing (and Microsoft’s install base for Windows is huge), which means that Microsoft needs to be very, very careful about releasing updates to something as “dangerous” as PatchGuard. I say “dangerous” in the sense that if a PatchGuard version is released with a bug, it could very well cause some number of Microsoft’s customers to bluescreen on boot, which would clearly not fly very well. Given the fact that Microsoft can’t really keep up with malware authors (many who are dedicated to writing malicious code with financial incentives to keep doing so) as it comes to the “cat and mouse” game with PatchGuard, it doesn’t make sense to try and use PatchGuard to stop malware.
  3. PatchGuard is targetted at vendors that are accountable to their customers. This point seems to be often overlooked, but PatchGuard works by making it painful for vendors to patch the kernel. This pain comes in the fact that ISVs who choose to bypass PatchGuard are at risk of causing customer computers to bluescreen on boot en-masse the next time that Microsoft releases a PatchGuard update. For malware authors, this is really much less of an issue; one compromised computer is the same as another, and many flavors of malware out there will try to do things like break automatic updates anyway. Furthermore, it is generally easier for malware authors to push out new versions of software to victims (that might counter a PatchGuard update) than it is for most ISVs to deploy updates to their paying customers, who often tend to be stubborn on the issue of software updates, typically insisting on internal testing before mass deployment.
  4. By the time malware is in a position to have to deal with PatchGuard as a potential blocking point, the victim has already lost. In order for PatchGuard to matter to malware, said malware must be able to run kernel level code on the victim’s computer. At this point, guarding the computer is really somewhat of a moot point, as the victim’s passwords, personal information, saved credit card numbers, secret documents, and whatnot are already toast in that the malware already has complete access to anything on the system. Now, there are a couple of scenarios (like trojaning a library computer kiosk) where the information desired by an attacker is not yet available, and the attacker has to hide his or her malicious code until a user comes around to type it in on a keyboard, but for the typical home or business computer scenario, the game is over the moment malicious code gets kernel level access to the box in question.

Now, for a bit of clarification. I said that PatchGuard was mostly not a security system. There is one real security benefit that PatchGuard conferns on customers, and that is the fact that PatchGuard keeps out ill-written software that does things like patch system calls and then fail to validate parameters correctly, resulting in new security issues being introduced. This is actually a real issue, believe it or not. Now, not everything that hooks the kernel introduces race conditions, security problems, or the like; it is possible (though difficult) to do so in a safe and correct manner in many circumstances, depending on what it is exactly that you’re trying to do. However, most of the software out there does tend to do things incorrectly, and in the process often inadvertently introduces security holes (typically local privilege escalation issues).

PatchGuard is not a DRM mechanism.

People who try to tell you that PatchGuard is DRM-related likely don’t really understand what it does. Even with PatchGuard installed, it’s still possible to extend the kernel with custom drivers. Additionally, the things that PatchGuard protects are only related to the new HD DRM mechanisms in Vista in a very loose sense. Many of the DRM provisions in Windows Vista are implemented in the form of drivers and not the kernel image itself, and are thus not protected by PatchGuard. If nothing else, third party drivers are responsible for the negotiation and lowest level I/O as relating to most of the new HD DRM schemes, and present an attractive target for DRM-bypass attempts that PatchGuard has no jurasdiction over. Unless Microsoft supplies all multimedia-output-related drivers in the system, this is how it will have to stay for the forseeable future, and it would be extremely difficult for Microsoft to protect just DRM-related drivers with PatchGuard in a non-trivially bypassable faction.

Preventing poorly written code from hooking the kernel is good for Windows customers.

There is a whole lot of bad code out there that hooks things incorrectly, and typically introduces race conditions and crash conditions. I have a great deal of first hand experience with this, as our primary product here at Positve has run afoul of third party software that hooks things and causes hard to debug random breakage that we get the blame for from a customer support perspective. The most common offenders that cause us pain are third party hooks that inject themselves into networking-related code and cause issues by subtlely breaking API semantics (for instance, we’ve run into certain versions of McAfee’s LSPs where if you call select in a certain way (Yes, I know that select is hardly optimal on Windows), the LSP will close a garbage handle value, sometimes nuking something important in the process. We’ve also run into a ton of cases where poorly behaved Internet Explorer add-ons will cause various corruption and other issues that we tend to get the blame for when our (rather complicated, code-wise, being a full-fledged VPN client) browser-based product is used in a partially corrupted iexplore process and eventually falls over and crashes. Another common trouble case relates to third party software that attempts to hook the CreateProcess* family of routines in order to propagate hooked code to child processes, and in the process manages to break legitimate usage of CreateProcess in certain obscure scenarios.

In our case, though, this is all just third-party “junkware” that breaks stuff in usermode. When you get to poorly written code that breaks the kernel, things get just that much worse; instead of a process blowing up, now the entire system hangs, crashes (i.e. bluescreens), experiences subtle filesystem corruption, or has local guest-to-kernel privilege escalation vulnerabilities introduced. Not only are the consequences of failure that much worse with hooking in kernel mode, but you can guess who gets the blame when a Windows box bluescreens: Microsoft, even though that sleazy anti-virus (or whatnot) software you just installed was really to blame. Microsoft claims that there are a significant amount of crashes on x86 Windows from third parties hooking the kernel wrong, and based on my own analysis of certain anti-virus software (and my own experience with hooking things blowing up our code in usermode), I don’t have any basis for disagreeing with Microsoft’s claim. From this perspective, PatchGuard is a good thing for consumers, as it represents a non-trivial attempt to force the industry to clean house and fix code that is at best questionable.

PatchGuard makes it significantly more difficult for ISVs out there which provide value on top of Windows through the use of careful hooking (or other non-blessed) means for extending the kernel.

Despite the dangers involved in kernel hooking, it is in fact possible to do it right in many circumstances, and there really are products out there which require kernel-level alterations that simply can’t be done without hooking (or other aspects that are blocked by PatchGuard). Note that, for the most part, I consider AV software as not in this class of software, and if nothing else, Microsoft’s Live OneCare demonstrates that it is perfectly feasible to deploy an AV solution without kernel hacks. Nonetheless, there are a number of niche solutions that revolve around things like kernel-level patching, which are completely shut out by PatchGuard. This presents an unfortunate negative atmosphere to any ISV that falls into this zone of having deployed technology that is now blocked in principle by PatchGuard, just because XYZ large anti-virus (the most common offenders, though there are certainly plenty of non-AV programs out there that are similarly “kernel-challenged”) vendor couldn’t be bothered to clean up their code and do things the correct way.

From this perspective, PatchGuard is damaging to customers (and affected ISVs), as they are essesntially forced to stay away from x64 (or take the risky road of trying to play cat-and-mouse with Microsoft and hack PatchGuard). This is unfortunate, and until recently, Microsoft has provided an outward appearance that they were taking a hard-line, stonewall stance against anything that ran afoul of PatchGuard, regardless of whether the program in question really had a “legitimate” need to alter the kernel. Fortunuately, for ISVs and customers, Microsoft appears to have very recently (or at least, very recently started saying the opposite thing in public) warmed up to the fact that completely shutting out a subset of ISVs and Windows customers isn’t such a great idea, and has reversed its previous statements regarding its willingness to cooperate with ISVs that run into problems with PatchGuard. I see this as a completely positive thing myself (keeping in mind that at work here, we don’t ship anything that conflicts with PatchGuard), as it signals that Microsoft is willing to work with ISVs that have “legitimate” needs that are being blocked by PatchGuard.

There is no giant conspiracy in Microsoft to shut out the ISVs of the world.

While I may not completely agree with the new reality that PatchGuard presents to ISVs, I have no illusions that Microsoft is trying to take over the world with PatchGuard. Like it or not, Microsoft is quite aware that the success of the Windows platform depends on the applications (and drivers) that run on top of it. As a result, it would be grossly stupid of Microsoft to try and leverage PatchGuard as a way to shut out other vendors entirely; customers don’t like changing vendors and ripping out all the experience and training that they have with an existing install base, and so you can bet that Microsoft trying to take over the software market “by force” with the use of PatchGuard wouldn’t go over well with Windows customers (or help Microsoft in its sales case for future Windows upgrades featuring PatchGuard), not to mention the legal minefield that Microsoft would be waltzing into should they attempt to do such a thing. Now, that being said, I do believe that somebody “dropped the ball”, so to speak, as far as cooperating with ISVs when PatchGuard was initially deployed. Things do appear to be improving from the perspective of Microsoft’s willingness to work with ISVs on PatchGuard, however, which is a great first step in the right direction (though it will remain to be seen how well Microsoft’ s current stance willl work out with the industry).

PatchGuard does represent, on some level, Microsoft exerting control over what users do with their hardware.

This is the main aspect of PatchGuard that I am most uncomfortable with, as I am of the opinion that when I buy a computer and pay for software, that I should absolutely be permitted to do what I want with it, even if that involves “dangerous” things like patching my own box’s kernel. In this regard, I think that Microsoft is attempting to play the “benevolent dictrator” with respect to kernel software; drivers that are dangerous to the reliablity of Windows computers, on average, are being blocked by Microsoft. Now, I do trust Microsoft and its code a whole lot more than most ISVs out there (I know first-hand how much more seriously Microsoft considers issues like reliablity and security at the present day (I’m not talking about Windows 95 here…) than a whole lot of other software vendors out there. Still, I don’t particularly like the fact that PatchGuard is akin to Microsoft telling me that “sorry, that new x64 computer and Windows license you bought won’t let you do things that we have classified as dangerous to system stability”. For example, I can’t write a program on Windows x64 that patches things blocked by PatchGuard, even for research and education purposes. I can understand that Microsoft is finding themselves up against the wall here against an uncooperative industry that doesn’t want to clean up its act, but still, it’s my computer, so I had damn well better be able to use it how I like. (No, I don’t consider the requirement of having a kernel debugger attached at boot time as an acceptable one, especially in automated scenarios.)

PatchGuard could make it more difficult to prove that a system is uncompromised, or to analyze a known compromised system for clues about an attack, if the malware involved is clever enough.

If you are trying to analyze a compromised (or suspected compromised) system for information about an attack, PatchGuard represents a dangerous unknown: a deliberately obfuscated chunk of code with sophisticated anti-analysis/anti-debugging/anti-reverse-engineering that ships with the operating system. This makes it very difficult to look at a system and determine if, say, it’s really PatchGuard that is running every so often, or perhaps some malware that has hijacked PatchGuard for nefarious purposes. Without digging though layer upon layer of obfuscation and anti-debugging code, definitively saying that PatchGuard on a system is uncompromised is just plain not do-able. In this respect, PatchGuard’s obfuscation presents a potentially attractive place for malicious code to hide and avoid being picked up in the course of post-compromise forensics of a running system that has been successfully compromised.

PatchGuard will not be obfuscation-based forever.

Eventually, I would expect that it will be replaced by hardware-enforced (hypervisor) based systems that utilize hardware-supported virtualization technology in new processors to provide a “ring -1” for code presently guarded by PatchGuard to execute in. This approach would move the burden of guarding kernel code to the processor itself, instead of the current “cat and mouse” game in software that exists with PatchGuard, as PatchGuard executes at the same privilege isolation level as code that might try to subvert it. Note that, in a hypervisor based system, hardware drivers would ideally be unable to cause damage (in terms of things like memory corruption and the like) to the kernel itself, which might eventually allow the system to continue functioning even if a driver fails. Of course, if drivers rely on being able to rewrite the kernel, this goal is clearly unattainable, and PatchGuard helps to ensure that in the future, there won’t be a backwards compatibility nightmare caused by a plethora of third-party drivers that rely on being able to directly alter the behavior of the kernel. (I suspect that x64 will supplant x86 in terms of being the operating sytem execution environment in the not-too-distant future). In usermode, x86 will likely live on for a very long time, but as x64 processors can execute x86 code at native speed, this is likely to not be an issue.

When PatchGuard is hypervisor-backed, it won’t be feasible to simply patch it out of existance, which means that ISVs will either have to comply with Microsoft’s requirements or find a way to evade PatchGUard entirely.

Overall, I think that there are both significant positive (and negative) aspects for PatchGuard. Whether it will turn out to be the best business decision (or the best experience for customers) remains to be seen; in an ideal world, though, only developers that really understand the full implications of what they are doing would patch the kernel (and only in safe ways), and things like PatchGuard would be unnecessary. I fear that this has become too much to expect for every programmer to do the right thing, though; one need only look at the myriad security vulnerabilities in software all over to see how little so many programmers care about correctness.

Vista ASLR is not on by default for image base addresses

Saturday, December 16th, 2006

This little tidbit seems to be missed in all of the press about Vista’s ASLR implementation: Vista ASLR (when speaking of randomizing image base addresses) does not apply to image bases by default. This is a sacrifice for application compatibility’s sake, in an effort to make fewer programs break “out of the box” on Vista. Most notably, this is the case even for images with base relocations.

Unfortunately, the mechanism to mark an executable image as “ASLR aware” (such that it can be freely rebased by Vista’s ASLR) is not at present documented. Furthermore, the linker version that is included with Visual Studio 2005 and the Windows Vista Platform SDK does not support the option necessary to mark as image as ASLR aware (though you could technically modify the image by hand with a hex editor or the like to enable it).

The WDK linker does support the new ASLR-enabling linker option, however (though it too does not appear to document it anywhere). You can find references to this new linker option in makefile.new:

!if defined(NO_DYNAMICBASE)
DYNAMICBASE_FLAG=
!else
! if $(_NT_TOOLS_VERSION) >= 0x800
DYNAMICBASE_FLAG=/dynamicbase
! else
DYNAMICBASE_FLAG=
! endif
!endif

Passing /dynamicbase to the WDK version of link.exe (8.00.50727.215) or later will set the 0x40 DllCharacteristics value in the PE header of the output binary. This corresponds to a newly-defined constant which is at present only documented briefly in the WDK version of ntimage.h:

#define IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE 0x0040
     // DLL can move.

If this flag is set, then the base address of an image can be randomized by Vista’s ASLR; if the flag is clear, however, then no ASLR-style randomizations are performed to the image base address of a particular image (in this case, however, it is important to note that heap and stack allocations are still randomized – it is only the image base address that does not become randomized).

Now, virtually all of the Microsoft PE images that ship with the operating system are built with /dynamicbase, so they will take full advantage of Vista’s ASLR with respect to image base randomization. However, third party (ISV)-built programs will not, by default, gain all the benefits of ASLR due to this application compatibility sacrifice. This is where the potential problem is, as effectively all existing third party PE images will need to be recompiled to enable ASLR on image base addresses. (Technically, you could use link /edit with the WDK linker to do this without a rebuild, or hex edit binaries, but this is not a real solution in my mind. In Microsoft’s defense, many third-party .exe files are often built without base relocations, which means that even if Microsoft had enabled ASLR by default, many third party programs would still not be getting the full benefit. This does not, however, mean that I fully agree with their decision…)

I can understand where Microsoft is coming from with an application compatibility perspective as far as ASLR’s impact on poorly written programs (of which there are an abundance of in the Windows world), but it is a bit unfortunate that there is no real way to administratively enable ASLR globally, or at least administratively make it an opt-out instead of opt-in setting.

So, if you are an ISV, here’s a heads up to be on the lookout for a link.exe version shipping with Visual Studio that supports /dynamicbase. When such becomes available, I would highly recommend enabling /dynamicbase for all of your projects (so long as you aren’t doing anything terribly stupid in your programs, enabling image base randomizations should be fairly harmless in most cases). You should also mark your .exe files as /FIXED:NO such that they contain a relocation section. This, when combined with /dynamicbase, will allow your .exe files to be randomized by ASLR (just the same as with DLLs that have relocation information and are built with /dynamicbase).

Update: Visual Studio 2005 SP1 has shipped. This update to Visual Studio includes a newer version of the linker, which supports the /dynamicbase option described above. So, be sure to rebuild your programs with /dynamicbase and /fixed:no with VS 2005 SP1 in order to take full advantage of ASLR on Vista.

Annoyances with IE7

Sunday, October 22nd, 2006

Since installing IE7, I’ve ran into a couple of annoyances.

The largest of which is that you can no longer use the trick to to launch an instance of iexplore.exe under Run As, and then navigate to the Control Panel to get an administrator view of Control Panel if you are logged on as a limited user (for pre-Vista). Now, instead, the admin IE instance will just tell the already-running explorer instance (which is running as your limited user account) to open a window at Control Panel. This is of course not what I want, which leaves me stuck with remembering the names of the individual .cpl files and launching them from an admin. Unfortunately for me, this just made running as a limited user on Windows XP and Windows Server 2003 much more painful; not a good thing from the perspective of a browser that is supposed to make things more secure. (In case you were wondering, you can’t just launch an admin explorer.exe while you already have explorer running under your user account. If you try to do this, the admin explorer instance will tell the already running explorer instance to open a new window, and then exit.) Alternatively, I could configure explorer to use a different process for every window, which does actually allow you to run explorer directly with Run As, but this has the unfortunate side effect of dramatically increasing memory usage if you have multiple explorer folder windows open.

The other things I have ran into so far are site compatibility problems, like lists breaking for WordPress. I am not sure if this particular problem is a WordPress one or an IE7 one, having not been particularly inclined to delve into HTML DOM debugging, but WordPress does appear to validate cleanly under the W3C XHTML validator. Some compatibility things are to be expected, of course, but it’s a bit disappointing to see them so glaringly obvious without either WordPress or Microsoft having done something to fix (or even acknowledge) the problem by now. Sigh.

As for tabbed browsing, I’m not sure if I really like this much yet. Up till now, I’ve pretty much always used “old-fashioned”, windowed browsing. I’ll see if tabbed browsing grows on me, but I wish I didn’t have to sacrifice ease of running as non-admin for it…

(Update: a commenter, jpassing, suggested using “explorer.exe /separate” with Run As, which appears to work nicely as a replacement for starting iexplore.exe when IE7 is installed.)

You might be using unhandled exception filters without even knowing it.

Friday, August 18th, 2006

In a previous posting, I discussed some of the pitfalls of unhandled exception filters (and how they can become a security problem for your application). I mentioned some guidelines you can use to help work around these problems and minimize the risk, but, as I alluded to earlier, the problem is actually worse than it might appear on the surface.

The real gotcha about unhandled exception filters is that you have probably used them before in programs or DLLs and not even known that you were using them, which makes it very hard to not use them in dangerous situations. How can this be, you might ask? Well, it turns out that the Microsoft C runtime library uses an unhandled exception filter to catch unhandled C++ exceptions and call the terminate handler registered by set_terminate.

This unhandled exception filter is setup by the internal CRT functions _cinit (via _initterm_e). If you have the CRT source handy, this lives in crt0dat.c. The call looks like:

/*
* do initializations
*/
initret = _initterm_e( __xi_a, __xi_z );

Here, “__xi_a” and “__xi_z” define the bounds of an array of function pointers to initializers called during the CRT’s initialization. There is a pointer to a function (_CxxSetUnhandledExceptionFilter) that sets up the unhandled exception filter for C++ exceptions in this array. Unfortunately, source code for the function used to setup _CxxUnhandledExceptionFilter is not present, but you can find it by looking at the CRT in a disassembler.

push    offset CxxUnhandledExceptionFilter
call    SetUnhandledExceptionFilter
mov     lpTopLevelExceptionFilter, eax
xor     eax, eax
retn

This is pretty standard; it is just saving away the old exception filter and registering its new exception filter. The unhandled exception filter itself checks for a C++ exception – if found, it calls terminate, otherwise it tries to verify that the previous exception filter points to executable code, and if so, it will call it.

push    esi
mov     esi, [esp+arg_0]
mov     eax, [esi]
cmp     dword ptr [eax], 0E06D7363h
jnz     short not_cpp_except
cmp     dword ptr [eax+10h], 3
jnz     short not_cpp_except
mov     eax, [eax+14h]
cmp     eax, 19930520h
jz      short is_cpp_except
cmp     eax, 19930521h
jnz     short not_cpp_except 

is_cpp_except:
call    terminate

not_cpp_except:
mov     eax, lpTopLevelExceptionFilter
test    eax, eax
jz      short old_filter_unloaded
push    eax             ; lpfn
call    _ValidateExecute
test    eax, eax
pop     ecx
jz      short old_filter_unloaded
push    esi
call    lpTopLevelExceptionFilter
jmp     short done

old_filter_unloaded:
xor     eax, eax

done:
pop     esi
retn    4

The problem with the latter validation is there is no way to tell if the code is part of a legitimate DLL, or part of the heap or some other allocation that has moved over where a DLL had previously been unloaded, which is where the security risk is introduced.

So, we have established that the CRT potentially does bad things by installing an unhandled exception filter – so what? Well, if you link to the DLL version of the CRT, you are probably fine. The CRT DLL is unlikely to be unloaded during the process lifetime and will only be initialized once.

The kicker is if you linked to the static (non-DLL) version of the CRT. This is where things start to get dicey. The dangerous combination here is that each image linked to the static version of the CRT will have its own copy of _cinit, and its own copy of _CxxSetUnhandledExceptionFilter, its own copy of _CxxUnhandledExceptionFilter, and soforth. What this boils down to is that every image linked to the static version of the Microsoft C runtime installs an unhandled exception filter. So, if you have a DLL (say one that hosts an ActiveX object) which links to the static CRT (which is pretty attractive, as for plugin type DLLs you don’t want to have to write a separate installer to ensure that end users have that cumbersome msvcr80.dll), then you’re in trouble. Since this is an especially common scenario (plugin DLL linking to the static CRT), you have probably ended up using an unhandled exception filter without knowing it (and probably without realizing the implications of doing so) – simply by making an ActiveX control usable by Internet Explorer, for example. This really turns into a worst case scenario when it comes to DLLs that host ActiveX objects. These are DLLs that are going to be frequently loaded and unloaded, are controllable by untrusted script, and are very likely to link to the static CRT to get out of the headache of having to manage installation of the DLL CRT version. If you put all of these things together and throw in any kind of crash bug, you’ve got a recipie for remote code execution. What is even worse is that this isn’t just quick-fixable with a patch to the CRT, as the vulnerable CRT version is compiled into your binaries and not in its own hotfixable standalone DLL.

So, in order to be truly safe from the dangers of unhandled exception filters, you also need to rid your programs of the static CRT. Yes, it does make setup more of a pain, but the DLL CRT is superior in many ways (not to mention that it doesn’t suffer from this security problem!).

Beware of custom unhandled exception filters in DLLs

Wednesday, August 16th, 2006

Previously, I had discussed some techniques for debugging unhandled exception filters.  There are some more gotchas relating to unhandled exception filters than just debugging them, though.

The problem with unhandled exception filters is that they are broken by design.  The API (SetUnhandledExceptionFilter) used to install them allows you to build a chain of unhandled exception filters as multiple images within a process install their own filter.  While this may seem fine in practice, it actually turns out to be a serious flaw.  The problem is that there is no support for removing these unhandled exception filters out of order.  If you do so, you often end up with a previous unhandled exception filter pointer used by some DLL that points to a now-unloaded DLL, because some DLL with an unhandled exception filter was unloaded, but the unhandled exception filter registered after it still has a pointer to the previous filter in the now unloaded DLL.

This turns out to (at best) cause your custom crash handling logic to appear to randomly fail to operate, and at worst, introduce serious security holes in your program.  You can read more about the security hole this introduces in the paper on Uninformed.org, but the basic idea is that if unhandled exception filters are unregistered out of order, you have a “dangling” function pointer that points to no-mans-land.  If an attacker can fill your process address space with shell code and then cause an exception (perhaps an otherwise “harmless” null pointer dereference that would cause your program to crash), he or she can take control of your process and run arbitrary code.

Unfortunately, there isn’t a good way to fix this from an application perspective.  I would recommend just not ever calling the previous unhandled exception filter, as there is no way to know whether it points to the real code that registered it or malicious code that someone allocated all over your process address space (called “heap spraying” in exploitation terminology).

You still have to deal with the fact that someone else might later install an unhandled exception filter ahead of yours, though, and then cause the unhandled exception filter chain to be broken upstream of you.  There is no real good solution for this; you might investigate patching SetUnhandledExceptionFilter or UnhandledExceptionFilter to always call you, but you can imagine what would happen if two functions try to do this at the same time.

So, the moral of the story is as follows:

  1. Don’t trust unhandled exception filters, as the model is brittle and easily breaks in processes that load and unload DLLs frequently.
  2. If you must register an unhandled exception filter, do it in a DLL that is never unloaded (or even the main .exe) to prevent the unhandled exception filter from being used as an attack vector.
  3. Don’t try to call the previous unhandled exception filter pointer that you get back from SetUnhandledExceptionFilter, as this introduces a security risk.
  4. Don’t install an unhandled exception filter from within a plugin type DLL that is loaded in a third party application, and especially don’t install an unhandled exception filter in a plugin type DLL that gets unloaded on the fly.

Unfortunately, it turns out to be even harder than this to not get burned by unhandled exception filter chaining.  More on that in a future posting.