Thread Local Storage, part 3: Compiler and linker support for implicit TLS

Last time, I discussed the mechanisms by which so-called explicit TLS operates (the TlsGetValue, TlsSetValue and other associated supporting routines).

Although explicit TLS is certainly fairly heavily used, many of the more “interesting” pieces about how TLS works in fact relate to the work that the loader does to support implicit TLS, or __declspec(thread) variables (in CL). While both TLS mechanisms are designed to provide a similar effect, namely the capability to store information on a per-thread basis, many aspects of the implementations of the two different mechanisms are very different.

When you declare a variable with the __declspec(thread) extended storage class, the compiler and linker cooperate to allocate storage for the variable in a special region in the executable image. By convention, all variables with the __declspec(thread) storage class are placed in the .tls section of a PE image, although this is not technically required (in fact, the thread local variables do not even really need to be in their own section, merely contiguous in memory, at least from the loader’s perspective). On disk, this region of memory contains the initializer data for all thread local variables in a particular image. However, this data is never actually modified and references to a particular thread local variable will never refer to an address within this section of the PE image; the data is merely a “template” to be used when allocating storage for thread local variables after a thread has been created.

The compiler and linker also make use of several special variables in the context of implicit TLS support. Specifically, a variable by the name of _tls_used (of the type IMAGE_TLS_DIRECTORY) is created by a portion of the C runtime that is static linked into every program to represent the TLS directory that will be used in the final image (references to this variable should be extern “C” in C++ code for name decoration purposes, and storage for the variable need not be allocated as the supporting CRT stub code already creates the variable). The TLS directory is a part of the PE header of an executable image which describes to the loader how the image’s thread local variables are to be managed. The linker looks for a variable by the name of _tls_used and ensures that in the on-disk image, it overlaps with the actual TLS directory in the final image.

The source code for the particular section of C runtime logic that declares _tls_used lives in the tlssup.c file (which comes with Visual Studio), making the variable pseudo-documented. The standard declaration for _tls_used is as so:

_CRTALLOC(".rdata$T")
const IMAGE_TLS_DIRECTORY _tls_used =
{
 (ULONG)(ULONG_PTR) &_tls_start, // start of tls data
 (ULONG)(ULONG_PTR) &_tls_end,   // end of tls data
 (ULONG)(ULONG_PTR) &_tls_index, // address of tls_index
 (ULONG)(ULONG_PTR) (&__xl_a+1), // pointer to callbacks
 (ULONG) 0,                      // size of tls zero fill
 (ULONG) 0                       // characteristics
};

The CRT code also provides a mechanism to allow a program to register a set of TLS callbacks, which are functions with a similar prototype to DllMain that are called when a thread starts or exits (cleanly) in the current process. (These callbacks can even be registered for a main process image, where there is no DllMain routine.) The callbacks are typed as PIMAGE_TLS_CALLBACK, and the TLS directory points to a null-terminated array of callbacks (called in sequence).

For a typical image, there will not exist any TLS callbacks (in practice, almost everything uses DllMain to perform per-thread initialization tasks). However, the support is retained and is fully functional. To use the support that the CRT provides for TLS callbacks, one needs to declare a variable that is stored in the specially named “.CRT$XLx” section, where x is a value between A and Z. For example, one might write the following code:

#pragma section(".CRT$XLY",long,read)

extern "C" __declspec(allocate(".CRT$XLY"))
  PIMAGE_TLS_CALLBACK _xl_y  = MyTlsCallback;

The strange business with the special section names is required because the in-memory ordering of the TLS callback pointers is significant. To understand what is happening with this peculiar looking declaration, it is first necessary to understand a bit about the compiler and linker organize data in the final PE image that is produced.

Non-header data in a PE image is placed into one or more sections, which are regions of memory with a common set of attributes (such as page protection). The __declspec(allocate(“section-name”)) keyword (CL-specific) tells the compiler that a particular variable is to be placed in a specific section in the final executable. The compiler additionally has support for concatenating similarly-named sections into one larger section. This support is activated by prefixing a section name with a $ character followed by any other text. The compiler concatenates the resulting section with the section of the same name, truncated at the $ character (inclusive).

The compiler alphabetically orders individual sections when concatenating them (due to the usage of the $ character in the section name). This means that in-memory (in the final executable image), a variable in the “.CRT$XLB” section will be after a variable in the “.CRT$XLA” section but before a variable in “.CRT$XLZ” section. The C runtime uses this quirk of the compiler to create an array of null terminated function pointers to TLS callbacks (with the pointer stored in the “.CRT$XLZ” section being the null terminator). Thus, in order to ensure that the declared function pointer resides within the confines of the TLS callback array being referenced by _tls_used, it is necessary place in a section of the form “.CRT$XLx“.

The creation of the TLS directory is, however, only one portion of how the compiler and linker work together to support __declspec(thread) variables. Next time, I’ll discuss just how the compiler and linker manage accesses to such variables.

Update: Phil mentions that this support for TLS callbacks does not work before the Visual Studio 2005 release. Be warned if you are still using an old compiler package.

Tags: , , ,

7 Responses to “Thread Local Storage, part 3: Compiler and linker support for implicit TLS”

  1. […] Nynaeve Adventures in Windows debugging and reverse engineering. « Thread Local Storage, part 3: Compiler and linker support for implicit TLS […]

  2. Nate says:

    This is similar to a feature in GNU ld called “linker sets”. Basically you can allocate data sections in each .o that are then collected into an array at link time. FreeBSD uses this for the SYSINIT macros for sorting the order and then booting various subsystems.

  3. Phil says:

    You mention that TLS callbacks are fully functional. That’s really only true for VS 2005/Whidbey or later. Before that, an incremental build would put NULL gaps between the .CRT$XL* sections, so the loader callback logic would always see a NULL before the first real callback. That got fixed in preparation for some other __declspec(thread) work that never got turned on.

  4. Skywing says:

    Updated the post to document that. I never tried it prior to VS2005 myself; thanks :)

    I’d love to hear the story behind whatever was planned with extending __declspec(thread), unless you can’t talk about it of course…

  5. Chris says:

    Very nice write-up.

    I’ve implimented the TLS callback for my exe and have noticed that there are 2 threads for which THREAD_ATTACH is being called but not THREAD_DETACH.

    The first is the first thread created after PROCESS_ATTACH. This is not the main app thread, i don’t know what it does as i don’t create it.

    The second is created when i make the first OpenGL realted call. It looks like it loads/unloads mcd32.dll and manages an invisible NVOpenGLPbuffer window, again that i don’t create.

    I can see when both threads are exiting in the debug output (i.e. the debuger sees them die, but i don’t get a THREAD_DETACH).
    The are exiting right after main() returns.

    After that i get PROCESS_DETACH.

    As they are being created/exiting in MS code i’d like to assume that they are exiting normally.

    Is there any other condition you know of that would prevent THREAD_DETACH from being sent e.g. another callback being called earlier that is preventing mine from being called.

  6. other chris says:

    i can’t get this working :( I have a static library and i want to free resources allocated with TlsAlloc (the library is single-threaded, but is used from multiple threads). I defined DllMain and added the following:

    #pragma section(“.CRT$XLY”,long,read)

    extern “C” __declspec(allocate(“.CRT$XLY”))
    PIMAGE_TLS_CALLBACK _xl_y = (PIMAGE_TLS_CALLBACK)DllMain;

    but the DllMain function is never called. What am i missing? I use VS2005 Express. Thanks!

  7. no-name says:

    Does the linker discard the function pointer?
    working with VC++2008Exp by this code.

    #ifdef _M_IX86
    #pragma comment (linker, “/INCLUDE:__tls_used”)
    #pragma comment (linker, “/INCLUDE:__xl_b”)
    #else
    #pragma comment (linker, “/INCLUDE:_tls_used”)
    #pragma comment (linker, “/INCLUDE:_xl_b”)
    #endif
    #ifdef _M_X64
    #pragma const_seg (“.CRT$XLB”)
    const
    #else
    #pragma data_seg (“.CRT$XLB”)
    #endif
    EXTERN_C PIMAGE_TLS_CALLBACK _xl_b = TLSCallbacks;
    #pragma data_seg ()
    #pragma const_seg ()