NWScript JIT engine: Parameterized script entry point invocation and subroutine structural analysis

Previously, I discussed how subroutine calls and extension (action service handler) calls are made in the NWScript instruction set. This time, we’ll dive in deeper to a feature that NWN2 added to NWScript: Parameterized entry point invocation, or the capability to pass parameters to a script’s main subroutine.

Originally, in the NWN1 days, NWScript did not support providing arguments to a script’s entry point symbol. Instead, when data needed to be made available to a script (say, a script that would be fired when a particular event occured), a special purpose action service handler would be created that would return the parameter to the script program. For example, a script might be called when a player attempts to open a door, and the script might need to deal with both the player object and the door being opened. To handle this case, the designers of NWN1 would create special action service handlers that would return these sorts of event parameters (see an example).

This approach has some advantages and disadvantages. On the upside, it’s very easy to expose new parameter information to scripts called for specific events. On the downside, however, this model is not very intuitive to script authors; the result is that NWN1 (and NWN2 in many cases due to backwards compatibility with NWN1) has a number of very specialized ‘get event information’ action service handlers that can be called meaningfully only at specific points in time. Figuring out which event handler is associated with which dedicated action service handler has been the source of innumerable bugs in NWN scripts.

In NWN2, the game designers added the capability to pass parameters directly to the entry point of a NWScript program, like any other NWScript subroutine, as a mechanism for passing information to script events. However, they had several constraints in doing so:

It should be easy to add new parameters without breaking already compiled scripts.
Significant changes to the compiler and execution environment should be avoided to the degree possible.

To that end, the NWN2 designers came up with a solution that required no changes to the NWScript compiler toolchain: They kept the existing calling convention for script subroutines and simply used it for the entry point symbol.

While this is an easy solution from the compiler point of view, it creates some unfortunate complications for the execution environment. Specifically, the entry point symbol is not actually the first bit of code executed by a NWScript program. Instead, there are two compiler-emitted stubs that are produced before the entry point:

First, a #loader routine runs. The #loader routine’s purpose is to allocate storage for the entry point symbol’s return value, if the entry point had one, and then call it. (Remember that the caller must allocate storage for the return value of a script subroutine.)
Second, if the script uses global variables, a #globals routine is called by #loader instead of the entry point. The #globals routine establishes a local variable frame containing all global variables in the script, then assigns the auxiliary SP pointer (BP) to its current SP value, before finally invoking the real entry point symbol.

If you’re paying attention, you may begin to see the problem that will arise here. Because the entry point symbol isn’t the first symbol called when the script is invoked, the execution environment can’t simply put the parameters on the stack before calling into the script the first time.

Why is this the case, you might ask? There are several issues here:

First, if the script returned a value, the return value will be pushed by #loader before the user’s entry point is called. This means that the return value of the entry point would appear to be a parameter to the entry point (recall that the return value should is the last item on the stack when calling a subroutine, and it is allocated by the caller of a subroutine).

Secondly, if the script program used global variables, were the execution environment to push parameters before first entering the script, the parameters would take the place of several global variables, and some of the global variables would turn into parameters to the script entry point.

In either case, we clearly have what might technically be described as a train wreck; the script program would behave wildly incorrectly if #loader and #globals were not modified to be aware of parameters passed to the entry point symbol.

While this could have been done, the designers of NWN2 elected not to do so. (Even if these two compiler-generated stubs were modified, the calling convention for a particular (event handler) script’s entry point would become fixed for all time — it would not be possible to add another parameter to the entry point symbol without breaking already compiled scripts.)

Instead, a series of modifications were made to the execution environment itself to cope with the fact that the compiler toolchain wasn’t updated, and the fact that the script might be designed to be called with a different count of parameters.

Specifically, when a parameterized script is going to be invoked, the execution environment must perform the following steps:

The actual number of parameters that the entry point symbol is determined, and the number and types of parameters passed to the script are adjusted accordingly. In the case of the stock NWN2 implementation, this is accomplished by requiring the source text for a parameterized script to be available; the source text compiler is then used to discover the parameter counts and types to the entry point symbol.
The execution environment arranges for parameters to be pushed after the entry point symbol is first branched to by watching for either #loader or #globals (which have several uniquely recognizable characteristics) to complete.

The first requirement, in particular, is a highly undesirable and onerous one (a compiled instruction set language should not require the execution environment to possess the source text).

For my implementation of a NWScript VM (and later, the JIT engine), I solved the first problem with a different approach (the second problem continues to be addressed by watching for #loader or #globals to finish execution within my VM):

Instead of relying on embedding a source compiler in my VM, I tagged the parameters to the entry point symbol as having an indetermine type on the VM stack. NWN2 defines conversion rules from strings to script parameters, as in some cases parameterized scripts are invoked from a list of strings. Finally, once an indetermine-typed parameter is referenced on the stack with, it is simply changed into the type that it was referenced as (since all stack references must be strongly typed, there is no ambiguity other than for a bulk copy operation, which doesn’t require a specific known type).

This doesn’t solve the problem of determining the count of parameters to the script, though. For that, my implementation performs a basic structural analysis phase of the script program instruction set. (The structural analysis phase — a subset of the NWScript Analyzer component — will also become critical to the JIT engine itself, as we’ll see later.)

The structural analysis operates as a static analysis step, wherein the flow of each call instruction in a script program is followed. This results in the discovery of several useful pieces of information:

All subroutines in the compiled script program are discovered by walking through the instruction stream.
The count of parameters and return values to each subroutine in the script program are discovered (i.e. the stack displacement of each subroutine across a call site).

The core of the structural analysis phase relies on the following properties of the NWScript instruction set:

All references to the stack in the NWScript instruction set must specify a displacement that is a compile-time constant. In particular, this means that constructs such as the creation of a variable amount of locals within a subroutine cannot be directly encoded into the instruction set[1].
All subroutines within the NWScript program follow the standard calling convention (except for #loader and #globals, which can be identified by virtue of always being the first subroutine invoked, or being the only subroutine to establish a global variables frame with the special purpose SAVEBP instruction). Notably, this means that if a subroutine is called with P parameter cells on the stack and R return cells on the stack, that on return from the subroutine, there will have been a write to a stack location R cells down from the top of the stack, and P cells will have been removed from the top of stack relative to subroutine entry.
No subroutine in the script program is unconditionally recursive without any other exit path (such a subroutine would make little sense in practice, so this is seen as a reasonable limitation). This is important, because analyzing the structural (stack displacment) properties of a subroutine requires tracing through to the end of the subroutine in at least one flow path successfully.
The structural analyzer must be provided with data on the stack displacement properties of all action service handlers. This is considered an acceptable limitation as the VM and the execution environment must be used in conjuction with action service handler implementations anyway. In this particular case, a program to designed to perform a very basic parse of the prototypes of the action handlers exposed by NWN and NWN2 was used to generate the data tables for those games.

Given these properties, it becomes possible to construct an algorithm to discover the structural properties (that is, the location, parameter count, and return count) of each subroutine in the script program. The algorithm can be described roughly as follows (excluding #loader and #globals which can be special cased safely):

At the start, set virtual SP=0.
Walk one instruction at a time forward via disassembly.
If the instruction references the stack…
- If the current instruction references the stack at a greater displacement than the previous greatest displacement within this subroutine, record the new displacement.
- If the current instruction pointer modifies the stack, update the current virtual SP appropriately.
- If the current instruction is a branch instruction, and we have not yet visited that branch, mark the branch as visited and queue a traversal of the branch (start at step #2 with the virtual SP at the time when the branch instruction was encountered). If the branch was unconditional and we had already visited the branch, switch to the next analysis queue entry.
If the current instruction is an action service handler call (ACTION), update the virtual SP based on the action service handler data table.
If the current instruction is a subroutine call…
- If the subroutine has not been analyzed yet, then queue a new analysis run (at step #1) for the new subroutine, queue a new analysis run for the current call site address and mark it as blocking on the new subroutine (taking the current virtual SP), and then switch to processing the next unanalyzed branch that is not blocking on a subroutine. If there are no unanalyzed branches, we have detected a case of unconditional recursion and analysis fails.
- If the subroutine has been analyzed already, then adjust the virtual SP accordingly.
When a RET instruction is encountered, record the largest SP offset encountered and record the displacement of the virtual SP at the return point from 0. If virtual SP is 0, then there were no parameters. If virtual SP is -1 cell, then there was 1 cell of parameters, etc. Additionally, if the largest SP offset was greater than the displacement of the virtual SP from 0, then the remainder is the count of return cells for the subroutine. Finally, once all the above tasks are completed when the RET instruction is observed, switch to the next analysis queue entry, or if we have none left, analysis is complete.

This algorithm, repeated iteratively, discovers the structure of the script program. With the structure of the program discovered, the VM execution environment can fulfill the task of invoking parameterized scripts without requiring a source text compiler on-board with it (a notable improvement over the game’s stock execution environment).

Next time, we’ll look more at the overall design of the NWScript analyzer and the IR that it produces. (If you are wanting to sneak an early look, Justin has a post up describing some more details about the analyzer.)

[1] It is possible to construct a program that accesses SP from a variable offset through creative use of the SAVEBP/RESTOREBP instructions in a hand-assembled script due to a quirk of the stock game VM, which saves the previous BP on the stack as an integer-type value. No such specimens are known to exist in the wild, and none of the known NWScript compiler implementations emit code to do so.

Tags: NWN2

This entry was posted on Monday, August 9th, 2010 at 10:33 pm and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

One Response to “NWScript JIT engine: Parameterized script entry point invocation and subroutine structural analysis”

NWScript JIT engine: Intermediate Representation (IR) design « Nynaeve says:

August 11, 2010 at 7:03 am

[…] Nynaeve Adventures in Windows debugging and reverse engineering. « NWScript JIT engine: Parameterized script entry point invocation and subroutine structural analysis […]