Understanding the kernel address space on 32-bit Windows Vista

[Warning: The dynamic kernel address space is subject to future changes in future releases. In fact, the set of possible address space regions types on public 32-bit Win7 drops is different from the original release of the dynamic kernel address space logic in Windows Vista. You should not make hardcoded assumptions about this logic in production code, as the basic premises outlined in this post are subject to change without warning on future operating system revisions.]

(If you like, you may skip the history lesson and background information and skip to the gory details if you’re already familiar with the subject.)

With 32-bit Windows Vista, significant changes were made to the way the kernel mode address space was laid out. In previous operating system releases, the kernel address space was divvied up into large, relatively fixed-size regions ahead of time. For example, the paged- and non-paged pools resided within fixed virtual address ranges that were, for the most part, calculated at boot time. This meant that the memory manager needed to make a decision up front about how much address space to dedicate to the paged pool versus the non-paged pool, or how much address space to devote to the system cache, and soforth. (The address space is not strictly completely fixed on a 32-bit legacy system. There are several registry options that can be used to manually trade-off address space from one resource to another. However, these settings are only taken into account at memory manager initialization time during system boot.)

As a consequence of these mostly static (after boot time) calculations, the memory manager had limited flexibility to respond to differing memory usage workloads at runtime. In particular, because the memory manager needed to dedicate large address space regions up front to one of several different resource sets (i.e. paged pool, non-paged pool, system cache), situations may arise wherein one of these resources may be heavily utilized to the point of exhaustion of the address range carved out for it (such as the paged pool), but another “peer” resource (such as the non-paged pool) may be experiencing only comparatively light utilization. Because the memory manager has no way to “take back” address space from the non-paged pool, for example, and hand it off to the paged pool if it so turns out allocations from the paged pool (for example) may fail while there’s plenty of comparatively unused address space that has all been blocked off for the exclusive usage of the non-paged pool.

On 64-bit platforms, this is usually not a significant issue, as the amount of kernel address space available is orders of magnitude beyond the total amount of physical memory available in even the largest systems available (for today, anyway). However, consider that on 32-bit systems, the kernel only has 2GB (and sometimes only 1GB, if the system is booted with /3GB) of address space available to play around with. In this case, the fact that the memory manager needs to carve off hundreds of megabytes of address space (out of only 2 or even 1GB total) up-front for pools and the system cache becomes a much more eye-catching issue. In fact, the scenario I described previously is not even that difficult to achieve in practice, as the size of the non-paged or paged pools are typically quite small compared to the amount of physical memory available to even baseline consumer desktop systems nowadays. Throw in a driver or workload that heavily utilizes paged- or non-paged pool for some reason, and this sort of “premature” pool exhaustion may occur much sooner than one might naively expect.

Now, while it’s possible to manually address this issue to a degree on 32-bit system using the aforementioned registry knobs, determining the optimum values for these knobs on a particular workload is not necessarily an easy task. (Most sysadmins I know of tend to have their eyes glaze over when step one includes “Attach a kernel debugger to the computer.”)

To address this growing problem, the memory manager was restructured by the Windows Vista timeframe to no longer treat the kernel address space as a large (or, not so large, depending upon how one wishes to view it) slab to carve up into very large, fixed-size regions at boot time. Hence, the concept of the dynamic kernel address space was born. The basic idea is that instead of carving the relatively meagre kernel address space available on 32-bit systems up into large chunks at boot time, the memory manager “reserves its judgement” until there is need for additional address space for a partial resource (such as the paged- or non-paged pool), and only then hands out another chunk of address space to the component in question (such as the kernel pool allocator).

While this has been mentioned publicly for some time, to the best of my knowledge, nobody had really sat down and talked about how this feature really worked under the hood. This is relevant for a couple of reasons:

  1. The !address debugger extension doesn’t understand the dynamic kernel address space right now. This means that you can’t easily figure out where an address came from in the debugger on 32-bit Windows Vista systems (or later systems that might use a similar form of dynamic kernel address space).
  2. Understanding the basic concepts of how the feature works provides some insight into what it can (and cannot do) to help alleviate the kernel address space crunch.
  3. While the kernel address space layout has more or less been a relatively well understood concept for 32-bit systems to date, much of that knowledge doesn’t really directly translate to Windows Vista (and potentially later) systems.

Please note that future implementations may not necessarily function the same under the hood with respect to how the kernel address space operates.

Internally, the way the memory manager provides address space to resources such as the non-paged or paged- pools has been restructured such that each of the large address space resources act as clients to a new kernel virtual address space allocator. Wherein these address space resources previously received their address ranges in the form of large, pre-calculated regions at boot time, instead, each now calls upon the memory manager’s internal kernel address space allocator (MiObtainSystemVa) to request additional address space.

The kernel address space allocator can be thought of as maintaining a logical pool of unused virtual address space regions. It’s important to note that the address space allocator doesn’t actually allocate any backing store for the returned address space, nor any other management structures (such as PTEs); it simply reserves a chunk of address space exclusively for the use of the caller. This architecture is required due to the fact that everything from driver image mapping to the paged- and non-paged pool backends have been converted to use the kernel address space allocator routines. Each of these components has very different views of what they’ll actually want to use the address space for, but they all commonly need an address space to work in.

Similarly, if a component has obtained an address space region that it no longer requires, then it may return it to the memory manager with a call to the internal kernel address space allocator routine MiReturnSystemVa.

To place things into perspective, this system is conceptually analogous to reserving a large address space region using VirtualAlloc with MEM_RESERVE in user mode. A MEM_RESERVE reservation doesn’t commit any backing store that allows data to reside at a particular place in the address space, but it does grant exclusive usage of an address range of a particular size, which can then be used in conjunction with whatever backing store the caller requires. Likewise, it is similarly up to the caller to decide how they wish to back the address space returned by MiObtainSystemVa.

The address space chunks dealt with by the kernel address space allocator, similarly to the user mode address space reservation system, don’t necessarily need to be highly granular. Instead, a granularity that is convenient for the memory management is chosen. This is because the clients of the kernel address space allocator will then subdivide the address ranges they receive for their own usage. (The exact granularity of a kernel address space allocation is, of course, subject to change in future releases based upon the whims of what is preferable for the memory manager)

For example, if a driver requests an allocation from the non-paged pool, and there isn’t enough address space assigned to the non-paged pool to handle the request, then the non-paged pool allocator backend will call MiObtainSystemVa to retrieve additional address space. If successful, it will conceptually add this address space to its free address space list, and then return a subdivided chunk of this address space (mated with physical memory backing store, as in this example, we are speaking of the non-paged pool) to the driver. The next request for memory from the non-paged pool might then come from the same address range previously obtained by the non-paged pool allocator backend. This behavior is, again, conceptually similar to how the user mode heap obtains large amounts of address space from the memory manager and then subdivides these address regions into smaller allocations for a caller of, say, operator new.

All of this happens transparently to the existing public clients of the memory manager. For example, drivers don’t observe any particularly different behavior from ExAllocatePoolWithTag. However, because the address space isn’t pre-carved into large regions ahead of time, the memory manager no longer has its hands proverbially tied with respect to allowing one particular consumer of address space to obtain comparatively much more address space than usual (of course, at a cost to the available address space to other components). In effect, the memory manager is now much more able to self-tune for a wide variety of workloads without the user of the system needing to manually discover how much address space would be better allocated to the paged pool versus the system cache, for example.

In addition, the dynamic kernel address space infrastructure has had other benefits as well. As the address spans for the various major consumers of kernel address space are now demand-allocated, so too are PTE and other paging-related structures related to those address spans. This translates to reduced boot-time memory usage. Prior to the dynamic kernel address space’s introduction, the kernel would reduce the size of the various mostly-static address regions based on the amount of physical memory on the system for small systems. However, for large systems, and especially large 64-bit systems, paging-related structures potentially describing large address space regions were pre-allocated at boot time.

On small 64-bit systems, the demand-allocation of paging related structures also removes address space limits on the sizes of individual large address space regions that were previously present to avoid having to pre-allocate vast ranges of paging-related structures.

Presently, on 64-bit systems, many, though not all of the major kernel address space consumers still have their own dedicated address regions assigned internally by the kernel address space allocator, although this could easily change as need be in a future release. However, demand-creation of paging-describing structures is still realized (with the reduction in boot-time memory requirements as described above,

I mentioned earlier that the !address extension doesn’t understand the dynamic kernel address space as of the time of this writing. If you need to determine where a particular address came from while debugging on a 32-bit system that features a dynamic kernel address space, however, you can manually do this by looking into the memory manager’s internal tracking structures to discern for which reason a particular chunk of address space was checked out. (This isn’t quite as granular as !address as, for example, kernel stacks don’t have their own dedicated address range (in Windows Vista). However, it will at least allow you to quickly tell if a particular piece of address space is part of the paged- or non-paged pool, whether it’s part of a driver image section, and soforth.)

In the debugger, you can issue the following (admittedly long and ugly) command to ask the memory manager what type of address a particular virtual address region is. Replace <<ADDRESS>> with the kernel address that you wish to inquire about:

?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))[ @@masm( ( <<ADDRESS>> - poi(nt!MmSystemRangeStart)) / (@$pagesize *

@$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )

Here’s a couple of examples:

kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( 89445008 - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaNonPagedPool (5)

kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( ndis - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaDriverImages (12)

kd> ?? (nt!_MI_SYSTEM_VA_TYPE) ( ((unsigned char *)(@@masm(nt!MiSystemVaType)))
[ @@masm( ( win32k - poi(nt!MmSystemRangeStart)) / (@$pagesize * @$pagesize / @@c++(sizeof(nt!_MMPTE))) ) ] )
_MI_SYSTEM_VA_TYPE MiVaSessionGlobalSpace (11)

The above technique will not work on 64-bit systems that utilize a dynamic (demand-allocated) kernel address space, as the address space tracking is performed differently internally.

(Many thanks to Andrew Rogers and Landy Wang, who were gracious enough to spend some time divulging insights on the subject.)

Tags: , ,

Comments are closed.