NWScript JIT engine: Performance considerations

Last time, we learned how SAVE_STATEs are supported by the MSIL JIT backend. This time, we’ll touch on everybody’s favorite topic — performance.

After all, the whole point of the JIT project is to improve performance of scripts; there wouldn’t be much point in using it over the interpretive VM if it wasn’t faster.

So, just how much faster is the MSIL JIT backend than my reference interpretive NWScriptVM? Let’s find out (when using the “direct fast” action service call mechanism)…

The answer, as it so often turns out to be, depends. Some workloads yield significantly greater performance, while other workloads yield comparatively similar performance.

Computational workloads

Scripts that are computationally-heavy in NWScript are where the JIT system really excels. For example, consider the following script program fragment:

int g_randseed = 0;

int rand()
{
	return g_randseed =
   (g_randseed * 214013 + 2531101) >> 16;
}

// StartingConditional is the entry point.
int StartingConditional(
  int i,
  object o,
  string s)
{
  for (i = 0; i < 1000000; i += 1)
    i += rand( ) ^ 0xabcdef / (rand( ) | 0x1); 

  return i;
}

Here, I compared 1000000 iterations of invoking this script's entry point, once via the JIT engine's C API, and once via the NWScriptVM's API.

When using the interpretive VM, this test took over a whopping five minutes to complete on my test system; ouch! Using the MSIL JIT on .NET 4.0, on the same system, yields an execution time on the order of just fourteen seconds, by comparison; this represents an improvement of almost 21.42 times faster execution than the interpretive VM.

Action service-bound workloads (non-string-based)

While that is an impressive-looking result, most scripts are not exclusively computationally-bound, but rather make heavy use of action service handlers exported by the script host. For example, consider a second test program, structured along the lines of this:

 vector v;
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );
 v = Vector( 1.0, 2.0, 3.0 );

In this context, Vector is an action service handler. With the interpretive VM in use, 1000000 iterations of this program consume on the order of thirty seconds.

By comparison, the MSIL JIT backend clocks in at approximately ten seconds. That's still a significant improvement, but not quite as earth-shattering as over 21 times faster execution speed. The reduction here stems from the fact that most of the work is offloaded to the script host and not the JIT'd code; in effect, the only gain we get is a reduction in make-work overhead related to the stack-based VM execution environment, rather than any boost to raw computational performance.

Action service-bound workloads (string-based with one argument)

It is possible to construct a "worst case" script program that receives almost no benefit from the JIT system. This can be done by writing a script program that spends almost all of its time passing strings to action service handlers, and receiving strings back from action service handlers.

Consider a program along the lines of this:

 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );
 StringToInt( IntToString( i ) + s );

When executed with the interpretive script VM, this program took approximately 70 seconds to complete the 1000000 iterations that I've been using as a benchmark. The MSIL JIT backend actually clocks in as just a smidgeon slower, at roughly 75-76 seconds on average (on my test machine).

Why is the JIT'd code (ever) slower than the interpretive VM? Well, this turns out to relate to the fact that I used System.String to represent a string in the JIT engine. While convenient, this does have some drawbacks, because a conversion is required in order to map between the std::string objects used by action service handlers (and the VM stack object) and the System.String objects used by the JIT'd code.

If a script program spends most of its time interfacing exclusively with action service calls that take and return strings, performance suffers due to the marshalling conversions involved.

Action service-bound workloads (string-based with more than one argument)

Not all action service calls related to strings are created equal, however. The more parameters passed to the action service call, the better the JIT'd code does in comparison to the script VM. The StringToInt / IntToString conversion case is an extreme example; even a minor change to use GetSubString calls shows a significant change in results, for example:

 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );
 s = GetSubString( s, 1, 1 );

In this test, the interpretive VM clocks in at approximately 30 seconds, whereas the JIT'd code finishes in nearly half the time, at around 15.5 seconds on average.

Performance conclusions

While the actual performance characteristics will vary significantly depending on the workload, most scripts will see a noticible performance increase.

Except for worst-case scenarios involving single-string action service handler, it's reasonable to postulate that most scripts have a reasonable chance at running twice as fast under the JIT than the VM if they are exclusively action service handler-bound.

Furthermore, any non-trivial, non-action-service-call instructions in a script will tend to heavily tip the scales in favor of the JIT engine; for general purpose data processing (including general flow control related logic such as if statements and loops), the interpretive VM simply can't keep up with the execution speed benefits offered by native code execution.

Now, it's important to note that in the case of NWN1 and NWN2, not all performance problems are caused by scripts; naturally, replacing the script VM with a JIT system will do nothing to alleviate those issues. However, for modules that are heavy on script execution, the JIT system offers significant benefits (and equally importantly, creates significant headroom to enable even more complex scripting without compromising server performance).

Tags:

One Response to “NWScript JIT engine: Performance considerations”

  1. […] Nynaeve Adventures in Windows debugging and reverse engineering. « NWScript JIT engine: Performance considerations […]