{"id":181,"date":"2007-10-23T07:00:12","date_gmt":"2007-10-23T12:00:12","guid":{"rendered":"http:\/\/www.nynaeve.net\/?p=181"},"modified":"2019-12-13T17:45:34","modified_gmt":"2019-12-13T22:45:34","slug":"thread-local-storage-part-2-explicit-tls","status":"publish","type":"post","link":"http:\/\/www.nynaeve.net\/?p=181","title":{"rendered":"Thread Local Storage, part 2: Explicit TLS"},"content":{"rendered":"<p><a title=\"Thread Local Storage, part 1: Overview\" href=\"http:\/\/www.nynaeve.net\/?p=180\">Previously<\/a>, I outlined some of the general design principles behind both flavors of TLS in use on Windows.  Anyone can see the design and high level interface to TLS by reading MSDN, though; the interesting parts relate to the implementation itself.<\/p>\n<p>The explicit TLS API is (by far) the simplest of the two classes of TLS in terms of the implementation, as it touches the fewest &#8220;moving parts&#8221;.  As I mentioned last time, there are really just four key functions in the explicit TLS API.  The most important two are <a title=\"TlsGetValue\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/ms686812.aspx\">TlsGetValue<\/a> and <a title=\"TlsSetValue\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/ms686818.aspx\">TlsSetValue<\/a>, which manage the actual setting and retrieving of per-thread pointers.<\/p>\n<p>These two functions are simple enough to annotate entirely.  The essential mechanism behind them is that they are basically just &#8220;dumb accessors&#8221; into an array (two arrays in actuality, <em>TlsSlots<\/em> and <em>TlsExpansionSlots<\/em>) in the TEB, which is indexed by the <em>dwTlsIndex<\/em> argument to return (or set) the desired per-thread variable.  The implementation of <em>TlsGetValue<\/em> on Vista (32-bit) is as follows (<em>TlsSetValue<\/em> is similar, except that it writes to the arrays instead of reading from them, and has support for demand-allocating the <em>TlsExpansionSlots<\/em> array; more on that later):<\/p>\n<pre>\r\nPVOID\r\nWINAPI\r\nTlsGetValue(\r\n\t__in DWORD dwTlsIndex\r\n\t)\r\n{\r\n   PTEB Teb = NtCurrentTeb(); \/\/ fs:[0x18]\r\n\r\n   \/\/ Reset the last error state.\r\n   Teb->LastErrorValue = 0;\r\n\r\n   \/\/ If the variable is in the main array, return it.\r\n   if (dwTlsIndex < 64)\r\n      return Teb->TlsSlots[ dwTlsIndex ];\r\n\r\n   if (dwTlsIndex > 1088)\r\n   {\r\n      BaseSetLastNTError( STATUS_INVALID_PARAMETER );\r\n      return 0;\r\n   }\r\n\r\n   \/\/ Otherwise it's in the expansion array.\r\n   \/\/ If it's not allocated, we default to zero.\r\n   if (!Teb->TlsExpansionSlots)\r\n      return 0;\r\n\r\n   \/\/ Fetch the value from the expansion array.\r\n   return Teb->TlsExpansionSlots[ dwTlsIndex - 64 ];\r\n}\r\n<\/pre>\n<p>(The <a title=\"TlsGetValue disassembly\" href=\"http:\/\/www.nynaeve.net\/Code\/TlsGetValue.txt\">assembler version<\/a> (annotated) is also available.)<\/p>\n<p>The <em>TlsSlots<\/em> array in the TEB is a part of every thread, which gives each thread a guaranteed set of 64 thread local storage indexes.  Later on, Microsoft decided that 64 was not enough TLS slots to go around and added the <em>TlsExpansionSlots<\/em> array, for an additional 1024 TLS slots.  The <em>TlsExpansionSlots<\/em> array is demand-allocated in <em>TlsAlloc<\/em> if the initial set of 64 slots is exceeded.<\/p>\n<p>(This is, by the way, the nature of the seemingly arbitrary 64 and 1088 TLS slot limitations mentioned <a title=\"Thread Local Storage\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/ms686749.aspx\">by MSDN<\/a>, for those keeping score.)<\/p>\n<p><em>TlsAlloc<\/em> and <em>TlsFree<\/em> are, for all intents and purposes, implemented just as what one would expect.  They acquire a lock, search for a free TLS slot (returning the index if one is found), otherwise indicating to the caller that there are no free slots.  If the first 64 slots are exhausted and the <em>TlsExpansionSlots<\/em> array has not been created, then <em>TlsAlloc<\/em> will allocate and zero space for 1024 more TLS slots (pointer-sized values), and then update the <em>TlsExpansionSlots<\/em> to refer to the newly allocated storage.<\/p>\n<p>Internally, <em>TlsAlloc<\/em> and <em>TlsFree<\/em> utilize the <a title=\"RtlInitializeBitMap\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/ms802979.aspx\">Rtl bitmap<\/a> package to track usage of individual TLS slots; each bit in a bitmap describes whether a particular TLS slot is free or in use.  This allows for reasonably fast (and space efficient) mapping of TLS slot usage for book-keeping purposes.<\/p>\n<p>If one has been following along so far, then the question as to what happens when <em>TlsAlloc<\/em> is called such that it must create the <em>TlsExpansionSlots<\/em> array after there is already more than one thread in the current process may have come to mind.  This might appear to be a problem at first glance, as <em>TlsAlloc<\/em> only creates the array for the current thread.  Although one might be tempted to conclude that, given this behavior of <em>TlsAlloc<\/em>, explicit TLS therefore doesn&#8217;t work reliably above 64 TLS slots if the extra slots are allocated after the second thread in the process is created, this is in fact not the case.  There exists some clever sleight of hand that is performed by <em>TlsGetValue<\/em> and <em>TlsSetValue<\/em>, which compensates for the fact that <em>TlsAlloc<\/em> can only create the <em>TlsExpansionSlots<\/em> memory block for the current thread.<\/p>\n<p>Specifically, if <em>TlsGetValue<\/em> is called with an array index within the confines of the <em>TlsExpansionSlots<\/em> array, but the array has not been allocated for the current thread, then zero is returned.  (This is the default value for an uninitialized TLS slot, and is thus consequently legal.)  Similarly, if <em>TlsSetValue<\/em> is called with an array index that falls under the <em>TlsExpansionSlots<\/em> array, and the array has not yet been created, <em>TlsSetValue<\/em> allocates the memory block on demand and initializes the requested TLS slot.<\/p>\n<p>There also exists one final twist in <em>TlsFree<\/em> that is required to support the behavior of releasing a TLS slot while there are multiple threads running.  A potential problem exists whereby a thread releases a TLS slot, and then it becomes reallocated, following which the previous contents of the TLS slot are still present on other threads running in the process.  <em>TlsFree<\/em> alleviates this problem by asking the kernel for help, in the form of the <em>ThreadZeroTlsCell<\/em> thread information class.  When the kernel sees a <em>NtSetInformationThread<\/em> call for <em>ThreadZeroTlsCell<\/em>, it enumerates all threads in the process and writes a zero pointer-length value to each running thread&#8217;s instance of the requested TLS slot, thus flushing the old contents and resetting the slot to the unallocated default state.  (It is not strictly necessary for this to have been done in kernel mode, although the designers chose to go this route.)<\/p>\n<p>When a thread exits normally, if the <em>TlsExpansionSlots<\/em> pointer has been allocated, it is freed to the process heap.  (Of course, if a thread is terminated by <a title=\"TerminateThread\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/ms686717.aspx\">TerminateThread<\/a>, the <em>TlsExpansionSlots<\/em> array is leaked.  This is yet one reason among innumerable others why you should stay away from TerminateThread.)<\/p>\n<p>Next up: Examining implicit TLS support (<em>__declspec(thread)<\/em> variables).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Previously, I outlined some of the general design principles behind both flavors of TLS in use on Windows. Anyone can see the design and high level interface to TLS by reading MSDN, though; the interesting parts relate to the implementation itself. The explicit TLS API is (by far) the simplest of the two classes of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,8,5],"tags":[17,13],"_links":{"self":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/181"}],"collection":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=181"}],"version-history":[{"count":1,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/181\/revisions"}],"predecessor-version":[{"id":539,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/181\/revisions\/539"}],"wp:attachment":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=181"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}