Thread Local Storage, part 5: Loader support for __declspec(thread) variables (process initialization time)

Last time, I described the mechanism by which the compiler and linker generate code to access a variable that has been instanced per-thread via the __declspec(thread) extended storage class. Although the compiler and linker have essentially “set the stage” with respect to implicit TLS at this point, the loader is the component that “fills in the dots” and supplies the necessary run-time infrastructure to allow everything to operate.

Specifically, the loader is responsible for managing the allocation of per-module TLS index values, the allocation and management of the memory for the ThreadLocalStoragePointer array referred to by the TEB of every thread. Additionally, the loader is also responsible for managing the memory for each module’s thread-instanced (that is, __declspec(thread)-decorated) variables.

The loader’s TLS-related allocation and management duties can conceptually be split up into four distinct areas (Note that this represents the Windows Server 2003 and earlier view of things; I will go over some of the changes that Windows Vista makes this this model in a future posting in the TLS series.):

  1. At process initialization time, allocate _tls_index values, determine the extent of memory required for each module’s TLS block, and call TLS and DLL initializers (in that order).
  2. At thread initialization time, allocate and initialize TLS memory blocks for each module utilizing TLS, allocate the ThreadLocalStoragePointer array for the current thread, and link the TLS memory blocks in to the ThreadLocalStoragePointer array. Additionally, TLS initializers and then DLL initializers (in that order) are invoked for the current thread.
  3. At thread deinitialization time, call TLS deinitializers and then DLL deinitializers (in that order), and release the current thread’s TLS memory blocks for each module using TLS, and release the ThreadLocalStoragePointer array.
  4. At process deinitialization time, call TLS and DLL initializers (in that order).

Of course, the loader performs a number of other tasks when these events occur; this is simply a list of those that have some bearing on TLS support.

Most of these operations are fairly straightforward, with the arguable exception of process initialization. Process initialization of TLS is primarily handled in two subroutines inside ntdll, LdrpInitializeTls and LdrpAllocateTls.

LdrpInitializeTls is invoked during process initialization after all DLLs have been loaded, but before any initializer (or TLS) routines have been called. It essentially walks the loaded module list and sums the length of TLS data for each module that contains a valid TLS directory. For each module that contains TLS, a data structure is allocated that contains the length of the module’s TLS data and the TLS index that has been assigned to that module. (The TlsIndex field in the LDR_DATA_TABLE_ENTRY structure appears to be unused except as a flag that the module has TLS (being always set to -1), at least as far back as Windows XP. It is worth mentioning that the WINE implementation of implicit TLS incorrectly uses TlsIndex as the real module TLS index, so it may be unreliable to assume that it is always -1 if you care about working on WINE.)

Modules that use implicit TLS and which are present at initialization time are additionally marked as pinned in memory for the lifetime of the process by LdrpInitializeProcess (the LoadCount of any such module is fixed to 0xFFFF). In practice, this is typically unlikely to matter, as for such modules to be present at process initialization time, they must also by definition static linked by either the main process image or a dependency of the main process image.

After LdrpInitializeTls has determined which modules use TLS in the current process and has assigned those modules TLS index values, LdrpAllocateTls is called to allocate and initialize module TLS values for the initial thread.

At this point, process initialization continues, eventually resulting in TLS initializers and then DLL initializers (DllMain) being called for loaded modules. (Note that the main process image can have one or more TLS callbacks, even though it cannot have a DLL initializer routine.)

One interesting fact about TLS initializers is that they are always called before DLL initializers for their corresponding DLL. (The process occurs in sequence, such that DLL A’s TLS and DLL initializers are called, then DLL B’s TLS and DLL initializers, and so forth.) This means that TLS initializers need to be careful about making, say, CRT calls (as the C runtime is initialized before the user’s DllMain routine is called, by the actual DLL initializer entrypoint, such that the CRT will not be initialized when a TLS initializer for the module is invoked). This can be dangerous, as global objects will not have been constructed yet; the module will be in a completely uninitialized state except that imports have been snapped.

Another point worth mentioning about the loader’s TLS support is that contrary to the Portable Executable specification, the SizeOfZeroFill member of the IMAGE_TLS_DIRECTORY structure is not used (or supported) by the linker or the loader. This means that in practice, all TLS template data is initialized, and the size of the memory block allocated for per-module implicit TLS does not include the SizeOfZeroFill member as the PE documentation (or certain other publications that appear to be based on said documentation) would seem to state. (It seems that the WINE folks happened to get it wrong as well, thanks to the implication in the PE specification that the field is actually used.)

Some programs abuse TLS callbacks for anti-debugging purposes (gaining code execution before the normal process entrypoint routine is executed by creating a TLS callback for the main process image), although this is, in practice, quite obvious as almost all PE images do not use TLS callbacks at all.

Up through Windows Server 2003, the above is really all the loader needs to do with respect to supporting __declspec(thread). While this approach would appear to work quite well, it turns out that there are, in fact, some problems with it (if you’ve been following along thus far, you can probably figure out what they are). More on some of the limitations of the Windows Server 2003 approach to implicit TLS next week.

Tags: , ,

3 Responses to “Thread Local Storage, part 5: Loader support for __declspec(thread) variables (process initialization time)”

  1. […] Last week, I described how the loader handles implicit TLS (as of Windows Server 2003). Although the loader’s support for implicit TLS works out well enough for what it was originally designed for, there are some cases where things do not turn out so happily. If you’ve been following along closely so far, you’ve probably already noticed some of the deficiencies relating to the design of implicit TLS. These defects in the design and implementation of TLS eventually spurred Microsoft to significantly revamp the loader’s implicit TLS support in Windows Vista. […]

  2. Joe says:

    SuspendThread(hThread);
    c.ContextFlags = CONTEXT_FULL | CONTEXT_CONTROL;
    GetThreadContext(hThread,&c);

    GetThreadSelectorEntry(hThread,c.SegFs,(PLDT_ENTRY)&SelEntry);
    teb_ptr = ( SelEntry.HighWord.Bits.BaseHi << 24) |
    (SelEntry.HighWord.Bits.BaseMid << 16) |
    SelEntry.BaseLow; // Fs pointst to TEB
    my_ptr = (char *)teb_ptr;
    my_ptr = my_ptr + 0x2c; // point to ThreadArray sotrage
    my_ptr1 = *my_ptr; // point to first slot

    When I ran the following code thru the VS Debugger my_ptr1 was 0

    Shouldn’t it have been initilized to the Value of ThreradLocalStorage Array

    I did this right after the CreateThread API hthread the return value from CreateThread return a value

  3. andrewl says:

    Fantastic work! Thanks.