{"id":185,"date":"2007-10-25T07:00:47","date_gmt":"2007-10-25T12:00:47","guid":{"rendered":"http:\/\/www.nynaeve.net\/?p=185"},"modified":"2019-12-13T17:45:34","modified_gmt":"2019-12-13T22:45:34","slug":"thread-local-storage-part-4-accessing-__declspecthread-data","status":"publish","type":"post","link":"http:\/\/www.nynaeve.net\/?p=185","title":{"rendered":"Thread Local Storage, part 4: Accessing __declspec(thread) data"},"content":{"rendered":"<p><a title=\"Thread Local Storage, part 3: Compiler and linker support for implicit TLS\" href=\"http:\/\/www.nynaeve.net\/?p=183\">Yesterday<\/a>, I outlined how the compiler and linker cooperate to support TLS.  However, I didn&#8217;t mention just what exactly goes on under the hood when one declares a <em>__declspec(thread)<\/em> variable and accesses it.<\/p>\n<p>Before the inner workings of a <em>__declspec(thread)<\/em> variable access can be explained, however, it is necessary to discuss several more special variables in tlssup.c.  These special variables are referenced by <em>_tls_used<\/em> to create the TLS directory for the image.<\/p>\n<p>The first variable of interest is <em>_tls_index<\/em>, which is implicitly referenced by the compiler in the per-thread storage resolution mechanism any time a thread local variable is referenced (well, almost every time; there&#8217;s an exception to this, which I&#8217;ll mention later on).  <em>_tls_index<\/em> is also the only variable declared in tlssup.c that uses the default allocation storage class.  Internally, it represents the current module&#8217;s TLS index.  The per-module TLS index is, in principal, similar to a TLS index returned by <em>TlsAlloc<\/em>.  However, the two are not compatible, and there exists significantly more work behind the per-module TLS index and its supporting code.  I&#8217;ll cover all of that later as well; for now, just bear with me.<\/p>\n<p>The definitions of <em>_tls_start<\/em> and <em>_tls_end<\/em> appear as so in tlssup.c:<\/p>\n<pre>\r\n#pragma data_seg(\".tls\")\r\n\r\n#if defined (_M_IA64) || defined (_M_AMD64)\r\n_CRTALLOC(\".tls\")\r\n#endif\r\nchar _tls_start = 0;\r\n\r\n#pragma data_seg(\".tls$ZZZ\")\r\n\r\n#if defined (_M_IA64) || defined (_M_AMD64)\r\n_CRTALLOC(\".tls$ZZZ\")\r\n#endif\r\nchar _tls_end = 0;\r\n<\/pre>\n<p>This code creates the two variables and places them at the start and end of the &#8220;.tls&#8221; section.  The compiler and linker will automatically assume a default allocation section of &#8220;.tls&#8221; for all <em>__declspec(thread)<\/em> variables, such that they will be placed between <em>_tls_start<\/em> and <em>_tls_end<\/em> in the final image.  The two variables are used to tell the linker the bounds of the TLS storage template section, via the image&#8217;s TLS directory (<em>_tls_used<\/em>).<\/p>\n<p>Now that we know how <em>__declspec(thread)<\/em> works from a language level, it is necessary to understand the supporting code the compiler generates for an access to a <em>__declspec(thread)<\/em> variable.  This supporting code is, fortunately, fairly straightforward.  Consider the following test program:<\/p>\n<pre>\r\n__declspec(thread) int threadedint = 0;\r\n\r\nint __cdecl wmain(int ac,\r\n   wchar_t **av)\r\n{\r\n   threadedint = 42;\r\n\r\n   return 0;\r\n}\r\n<\/pre>\n<p>For x64, the compiler generated the following code:<\/p>\n<pre>\r\nmov\t ecx, DWORD PTR _tls_index\r\nmov\t rax, QWORD PTR gs:88\r\nmov\t edx, OFFSET FLAT:threadedint\r\nmov\t rax, QWORD PTR [rax+rcx*8]\r\nmov\t DWORD PTR [rdx+rax], 42\r\n<\/pre>\n<p>Recall that the <em>gs<\/em> segment register refers to the base address of the TEB on x64.  88 (0x58) is the offset in the TEB for the <em>ThreadLocalStoragePointer<\/em> member on x64 (more on that later):<\/p>\n<pre>   +0x058 ThreadLocalStoragePointer : Ptr64 Void<\/pre>\n<p>If we examine the code after the linker has run, however, we&#8217;ll notice something strange:<\/p>\n<pre>\r\nmov     ecx, cs:_tls_index\r\nmov     rax, gs:58h\r\n<span style=\"color:#ff0000\">mov     edx, 4<\/span>\r\nmov     rax, [rax+rcx*8]\r\nmov     dword ptr [rdx+rax], 2Ah ; 42\r\nxor     eax, eax\r\n<\/pre>\n<p>If you haven&#8217;t noticed it already, the offset of the &#8220;threadedint&#8221; variable was resolved to a small value (4).  Recall that in the pre-link disassembly, the &#8220;mov edx, 4&#8221; instruction was &#8220;mov\t edx, OFFSET FLAT:threadedint&#8221;.<\/p>\n<p>Now, 4 isn&#8217;t a very flat address (one would expect an address within the confines of the executable image to be used).  What happened?<\/p>\n<p>Well, it turns out that the linker has some tricks up its sleeve that were put into play here.  The &#8220;offset&#8221; of a <em>__declspec(thread)<\/em> variable is assumed to be relative to the base of the &#8220;.tls&#8221; section by the linker when it is resolving address references.  If one examines the &#8220;.tls&#8221; section of the image, things begin to make a bit more sense:<\/p>\n<pre>\r\n0000000001007000 _tls segment para public 'DATA' use64\r\n0000000001007000      assume cs:_tls\r\n0000000001007000     ;org 1007000h\r\n0000000001007000 _tls_start        dd 0\r\n<span style=\"color:#ff0000\">0000000001007004 ; int threadedint\r\n0000000001007004 ?threadedint@@3HA dd 0<\/span>\r\n0000000001007008 _tls_end          dd 0\r\n<\/pre>\n<p>The offset of &#8220;threadedint&#8221; from the start of the &#8220;.tls&#8221; section is indeed 4 bytes.  But all of this <em>still<\/em> doesn&#8217;t explain how the instructions the compiler generated access a variable that is instanced per thread.<\/p>\n<p>The &#8220;secret sauce&#8221; here lies in the following three instructions:<\/p>\n<pre>\r\nmov     ecx, cs:_tls_index\r\nmov     rax, gs:58h\r\nmov     rax, [rax+rcx*8]\r\n<\/pre>\n<p>These instructions fetch <em>ThreadLocalStoragePointer<\/em> out of the TEB and index it by <em>_tls_index<\/em>.  The resulting pointer is then indexed again with the offset of <em>threadedint<\/em> from the start of the &#8220;.tls&#8221; section to form a complete pointer to this thread&#8217;s instance of the <em>threadedint<\/em> variable.<\/p>\n<p>In C, the code that the compiler generated could be visualized as follows:<\/p>\n<pre>\r\n\/\/ This represents the \".tls\" section\r\nstruct _MODULE_TLS_DATA\r\n{\r\n   int tls_start;\r\n   int threadedint;\r\n   int tls_end;\r\n} MODULE_TLS_DATA, * PMODULE_TLS_DATA;\r\n\r\nPTEB Teb;\r\nPMODULE_TLS_DATA TlsData;\r\n\r\nTeb     = NtCurrentTeb();\r\nTlsData = Teb->ThreadLocalStoragePointer[ _tls_index ];\r\n\r\nTlsData->threadedint = 42;\r\n<\/pre>\n<p>This should look familiar if you&#8217;ve used explicit TLS before.  The typical paradigm for explicit TLS is to place a structure pointer in a TLS slot, and then to access your thread local state, the per thread instance of the structure is retrieved and the appropriate variable is then referenced off of the structure pointer.  The difference here is that the compiler and linker (and loader, more on that later) cooperated to save you (the programmer) from having to do all of that explicitly; all you had to do was declare a <em>__declspec(thread)<\/em> variable and all of this happens magically behind the scenes.<\/p>\n<p>There&#8217;s actually an additional curve that the compiler will sometimes throw with respect to how implicit TLS variables work from a code generation perspective.  You may have noticed how I showed the x64 version of an access to a <em>__declspec(thread)<\/em> variable; this is because, by default, x86 builds of a .exe involve a special optimization (<a title=\"\/GA (Optimize for Windows Application)\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/yetwazx6(VS.80).aspx\">\/GA<\/a> (<a title=\"\/GA (Optimize for Windows Application)\" href=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/yetwazx6(VS.80).aspx\">Optimize for Windows Application<\/a>, quite possibly the worst name for a compiler flag <em>ever<\/em>)) that eliminates the step of referencing the special <em>_tls_index<\/em> variable by assuming that it is zero.<\/p>\n<p>This optimization is only possible with a .exe that will run as the main process image.  The assumption works in this case because the loader assigns per-module TLS index values on a sequential basis (based on the loaded module list), and the main process image should be the second thing in the loaded module list, after NTDLL (which, now that this optimization is being used, can never have any <em>__declspec(thread)<\/em> variables, or it would get TLS index zero instead of the main process image).  It&#8217;s worth noting that in the (extremely rare) case that a .exe exports functions and is imported by another .exe, this optimization will cause random corruption if the imported .exe happens to use <em>__declspec(thread)<\/em>.<\/p>\n<p>For reference, with \/GA enabled, the x86 build of the above code results in the following instructions:<\/p>\n<pre>\r\nmov     eax, large fs:2Ch\r\nmov     ecx, [eax]\r\nmov     dword ptr [ecx+4], 2Ah ; 42\r\n<\/pre>\n<p>Remember that on x86, <em>fs<\/em> points to the base address of the TEB, and that <em>ThreadLocalStoragePointer<\/em> is at offset +0x2C from the base of the x86 TEB.<\/p>\n<p>Notice that there is no reference to <em>_tls_index<\/em>; the compiler assumes that it will take on the value zero.  If one examines a .dll built with the x86 compiler, the \/GA optimization is always disabled, and <em>_tls_index<\/em> is used as expected.<\/p>\n<p>The magic behind <em>__declspec(thread)<\/em> extends beyond just the compiler and linker, however.  Something still has to set up the storage for each module&#8217;s per-thread state, and that something is the loader.  More on how the loader plays a part in this complex process next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday, I outlined how the compiler and linker cooperate to support TLS. However, I didn&#8217;t mention just what exactly goes on under the hood when one declares a __declspec(thread) variable and accesses it. Before the inner workings of a __declspec(thread) variable access can be explained, however, it is necessary to discuss several more special variables [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,5],"tags":[16,15,17,13],"_links":{"self":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/185"}],"collection":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=185"}],"version-history":[{"count":1,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":537,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/185\/revisions\/537"}],"wp:attachment":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}