Thread Local Storage, part 8: Wrap-up

This is the final post in the Thread Local Storage series, which is comprised of the following articles:

  1. Thread Local Storage, part 1: Overview
  2. Thread Local Storage, part 2: Explicit TLS
  3. Thread Local Storage, part 3: Compiler and linker support for implicit TLS
  4. Thread Local Storage, part 4: Accessing __declspec(thread) data
  5. Thread Local Storage, part 5: Loader support for __declspec(thread) variables (process initialization time)
  6. Thread Local Storage, part 6: Design problems with the Windows Server 2003 (and earlier) approach to implicit TLS
  7. Thread Local Storage, part 7: Windows Vista support for __declspec(thread) in demand loaded DLLs
  8. Thread Local Storage, part 8: Wrap-up

By now, much of the inner workings of TLS (both implicit and explicit) on Windows should appear less mysterious, and a number of the seemingly arbitrary restrictions on limitations (maximum counts of explicit TLS slots on various operating systems, and limitations with respect to the usage of __declspec(thread) on demand loaded DLLs). Although many of these things can be (and should) considered implementation details that are subject to change, knowing how things work “under the hood” often comes in useful from time to time. For example, with an understanding of why there’s a hard limit to the number of available explicit TLS slots, the importance of reusing one TLS slots for many variables (by placing them into a structure that is pointed to by the contents of a TLS slot) should become clear.

Many of the details of implicit TLS are actually rather set in stone at this point, due to the fact that the compiler has been emitting code to directly access the ThreadLocalStoragePointer field in the TEB. Interestingly enough, this makes ThreadLocalStoragePointer a “guaranteed portable” part of the TEB, along with the NT_TIB header, despite the fact that the contents between the two are not defined to be portable (and are certainly not across, say, Windows 95).

Most of the inner workings of TLS are fairly straightforward, although there are some clever tricks employed to deal with scenarios such as TLS slots being released while threads are active. Many of the operational details of day to day TLS operation, such as how explicit TLS operates, are significantly different on Windows 95 and other operating systems of the 16-bit Windows lineage, so I would not recommend relying on the details of the implementation of TLS for non-NT-based systems.

Incidentally, most of the operating system itself does not use TLS in the way that it is exposed to third party programs. Instead, many operating system components either have their own dedicated fields in the TEB, or for larger amounts of data that may not need to be allocated for every thread in the system, a pointer field that can be filled with a pointer to a memory block at runtime if desired. For instance, there’s a ReservedForNtRpc field, a number of fields set aside for OpenGL ICDs (so much for Microsoft not supporting OpenGL), a WinSockData field for ws2_32, and many other similar fields for various operating system components.

This doesn’t mean that these components are really getting preferential treatment, as for the most part, an access to such a field in the TEB is in practice not really slower than an access through the documented TLS APIs. The benefit from providing these components with their own dedicated storage in the TEB is that in many cases, these components are already going to be active. If said operating system components used conventional TLS, then this would significantly detract from the already limited number of TLS slots available for use by third party components.

Some components do actually use standard TLS, or at least the space allocated in the TEB for standard TLS slots (though in special circumstances and without going through the standard explicit TLS APIs). For example, the 64-bit portion of the Wow64 layer in a 32-bit process repurposes some of the 64-bit TLS slots (which would normally be completely unused in such a process) for its own internal usage, thereby avoiding the need for dedicated storage in the TEB. That, however, is a story for another day.

Tags: ,

2 Responses to “Thread Local Storage, part 8: Wrap-up”

  1. […] Nynaeve Adventures in Windows debugging and reverse engineering. « Thread Local Storage, part 8: Wrap-up […]

  2. edgar says:

    Thanks for these great indepth articles !