{"id":115,"date":"2007-02-12T17:55:09","date_gmt":"2007-02-12T22:55:09","guid":{"rendered":"http:\/\/www.nynaeve.net\/?p=115"},"modified":"2019-12-13T17:38:39","modified_gmt":"2019-12-13T22:38:39","slug":"compiler-tricks-in-x86-assembly-part-2","status":"publish","type":"post","link":"http:\/\/www.nynaeve.net\/?p=115","title":{"rendered":"Compiler tricks in x86 assembly, part 2"},"content":{"rendered":"<p>A common operation in many C programs is a division by a constant quantity.  This sort of operation occurs all over the place; in explicit mathematical calculations, and in other, more innocuous situations such as where the programmer does something like take a size in bytes and convert it into a count of structures.<\/p>\n<p>This fundamental operation (division by a constant) is also one that the compiler will tend to optimize in ways that may seem a bit strange until you know what is going on.<\/p>\n<p>Consider this simple C program:<\/p>\n<pre>\r\nint\r\n__cdecl\r\nwmain(\r\n\tint ac,\r\n\twchar_t** av)\r\n{\r\n\tunsigned long x = rand();\r\n\r\n\treturn x \/ 1518;\r\n}\r\n<\/pre>\n<p>In assembly, one would expect to see the following operations emitted by the compiler:<\/p>\n<ol>\n<li>A call to <em>rand<\/em>, to populate the &#8216;x&#8217; variable.<\/li>\n<li>A <em>div<\/em> instruction to perform the requested division of &#8216;x&#8217; by 1518.<\/li>\n<\/ol>\n<p>However, if optimizations are turned on and the compiler is not configured to prioritize small code over performance exclusively, then the result is not what we might expect:<\/p>\n<pre>\r\n; int __cdecl main(int argc,\r\n  const char **argv,\r\n  const char *envp)\r\n_main proc near\r\ncall    ds:__imp__rand\r\nmov     ecx, eax\r\nmov     eax, 596179C3h\r\nmul     ecx\r\nmov     eax, ecx\r\nsub     eax, edx\r\nshr     eax, 1\r\nadd     eax, edx\r\nshr     eax, 0Ah ; return value in eax\r\nretn\r\n<\/pre>\n<p>The first time you encounter something like this, you&#8217;ll probably be wondering something along the lines of &#8220;what was the compiler thinking when it did that, and how does that set of instructions end up dividing out &#8216;x&#8217; by 1518?&#8221;.<\/p>\n<p>What has happened here is that the compiler has used a trick sometimes known as <em>magic number division<\/em> to improve performance.  In x86 (and in most processor architectures, actually), it is typically much more expensive to perform a division than a multiplication (or most any other basic mathematical primative, for that matter).  While ordinarily, it is typically not possible to make division any faster than what the &#8216;div&#8217; instruction gives you performance-wise, under special circumstances it is possible to do a bit better.  Specifically, if the compiler knows that one of the operands to the division operation is a constant (specifically, the divisor), then it can engage some clever tricks in order to eliminate the expensive &#8216;div&#8217; instruction and turn it into a sequence of other instructions that, while initially longer and more complicated on the surface, are actually faster to execute.<\/p>\n<p>In this instance, the compiler has taken one &#8216;div&#8217; instruction and converted it into a multiply, subtract, add, and two right shift operations.  The basic idea behind how the code the compiler produced actually gives the correct result is that it typically involves a multiplication with a large constant, where the low 32-bits are then discarded and then remaining high 32-bits (recall that the &#8216;mul&#8217; instruction returns the result in edx:eax, so the high 32-bits of the result are stored in edx) have some transform applied, typically one that includes fast division by a power of 2 via a right shift.<\/p>\n<p>Sure enough, we can demonstrate that the above code works by manually comparing the result to a known value, and then manually performing each of the individual operations.  For example, assume that in the above program, the call to rand returns 10626 (otherwise known as 7 * 1518, for simplicity, although the optimized code above works equally well for non-multiples of 1518).<\/p>\n<p>Taking the above set of instructions, we perform the following operations:<\/p>\n<ol>\n<li>Multiply 10626 by the magic constant 0x596179C3.  This yields 0x00000E7E00001006.<\/li>\n<li>Discard the low 32-bits (eax), saving only the high 32-bits of the multiplication (edx; 0xE7E, or 3710 in decimal).<\/li>\n<li>Subtract 3710 from the original input value, 10626, yielding 6916.<\/li>\n<li>Divide 6916 by 2^1 (bit shift right by 1), yielding 3458.<\/li>\n<li>Add 3710 to 3458, yielding 7168.<\/li>\n<li>Divide 7168 by 2^10 (bit shift right by 10), yielding 7, our final answer.  (10626 \/ 1518 == 7)<\/li>\n<\/ol>\n<p>Depending on the values being divided, it is possible that there may only be a right shift after the large multiply.  For instance, if we change the above program to divide by 1517 instead of 1518, then we might see the following code generated instead:<\/p>\n<pre>\r\nmov     eax, 5666F0A5h\r\nmul     ecx\r\nshr     edx, 9\r\n<\/pre>\n<p>Coming up with the &#8220;magical divide&#8221; constants in the first place involves a bit of math; interested readers can read up about it <a title=\"Integer division by constants\" href=\"http:\/\/blogs.msdn.com\/devdev\/archive\/2005\/12\/12\/502980.aspx\">on DevDev<\/a>.<\/p>\n<p>If you&#8217;re reverse engineering something that has these characteristics (large multiply that discards the low 32-bits and performs some transforms on the high 32-bits), then I would recommend breaking up each instruction into the corresponding mathematical operation (be aware of tricks like shift right to divide by a power of 2, or shift left to multiply by a power of 2).  As a quick &#8220;litmus test&#8221;, you can always just pick an input and do the operations, and then simply compare the result to the input and see if it makes sense in the context of a division operation.<\/p>\n<p>More on other optimizer tricks in a future posting&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A common operation in many C programs is a division by a constant quantity. This sort of operation occurs all over the place; in explicit mathematical calculations, and in other, more innocuous situations such as where the programmer does something like take a size in bytes and convert it into a count of structures. This [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,8],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/115"}],"collection":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=115"}],"version-history":[{"count":1,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/115\/revisions"}],"predecessor-version":[{"id":605,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/115\/revisions\/605"}],"wp:attachment":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=115"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}