Month: April 2022

Microvium Closure Variable Indexing

Microvium Closure Variable Indexing

TL;DR: In many situations, a program in Microvium bytecode can access closure variables with just a single-byte bytecode instruction. The instruction contains a 4-bit index that either indexes into the current lexical scope or recursively overflows to the next-outer lexical scope, cascading up the lexical scope chain until the variable is found. In this post, I discuss some of the journey and the details of how this works.


What is a closure variable?

A closure is a function that accesses variables outside its local lexical environment. See the MDN article on this for a better explanation.

What I mean by a “closure variable” in this post is a local variable that is accessed by a closure1. For example, in the following code, callback is a closure because it accesses x which is outside its own local variables, and correspondingly x is a “closure variable” by this definition because it’s accessed by callback:

function foo() {
  let x = 10;
  setTimeout(callback, 1000);
  function callback() { 
    console.log(x);
  }
}

In Microvium (and other JavaScript engines), closure variables are treated differently from normal local variables because they need to outlive the stack frame in which they are declared. The variable x here needs to survive beyond the return of foo. This is done by allocating the slots for these variables on the heap instead of the stack.

If a variable is demoted to the heap, all access to that variable is via the heap, even if it’s being accessed locally. For example:

function foo() {
  let x = 10;
  console.log(x);  // <--- this is also accessing `x` on the heap
  setTimeout(callback, 1000);
  function callback() { 
    console.log(x);
  }
}

There is a static analysis pass in Microvium that decides whether a variable is a closure variable or not. In principle, it’s safe to allocate all variables on the heap rather than doing this analysis, but the heap is expensive, so as an optimization some variables can be promoted to the stack if they are not used by closures.

For the sake of the rest of this blog post, I will pretend that all variables are closure variables, even if I do not show the closure that accesses them. Or equivalently, I will pretend that there is no optimization analysis to determine which variables can be promoted to the stack.

Background: how it might have worked

Before I get to describing how it does work today, let me describe how I previously implemented it before having my eureka moment. You can skip this section if you want to get straight to the answer instead of going through the journey as I did.

The state of the virtual machine needs to keep track of the current lexical scope. So let’s add a new machine register called scope which points to the current lexical scope.

A goal with Microvium is to keep the engine implementation small. So rather than introducing a new allocation type for environment records and new bytecode instructions to access them, maybe we can reuse an existing type and existing instruction.

A natural solution that might come to mind is to use an Object, where the property keys of the object correspond to the variable names. Microvium has existing bytecode instructions ObjectGet and ObjectSet to get and set properties on an object.

However, Objects are very expensive at runtime. Each property is stored in memory as a key-value pair taking 4 bytes, and property lookup is a linear-time search through the properties to find the one with the right key.

Since we can statically determine the number of variables, a better choice of container for our variables would be a fixed-length array, where we statically compute an index for each variable rather than using its name. In Microvium, a fixed-length array is quite efficient, having constant-time random access to any slot by its index and each slot only consumes 2 bytes.

So let’s think about compiling the following JavaScript to IL:

let x, y , z;
x = 10;
y = 20;
z = 30;

We will have a fixed-length array with 3 slots for these 3 variables, and our new scope register will point to this fixed-length array to say that this is the current scope. Whenever we need to access one of these variables, we will read and write to the array.

I’m showing the allocation header here for completeness. These arrays are heap-allocated, so they each require this implicit memory slot for information about the size and type of the allocation.

So, the following is the IL sequence that might be produced for the above JavaScript:

// let x, y, z;
ArrayNew(3)         // Allocate 3 slots on the heap
StoreReg('scope')   // Save the array in the "scope" register

// x = 10;
LoadReg('scope')    // Fetch the current scope
Literal(10)
ArraySet(0)         // Set the first slot in the array

// x = 20;
LoadReg('scope')    // Fetch the current scope
Literal(20)
ArraySet(1)         // Set the second slot in the array

// x = 30;
LoadReg('scope')    // Fetch the current scope
Literal(30)
ArraySet(2)         // Set the third slot in the array

Side note: Microvium IL is based on a stack machine (see Wikipedia). The instruction Literal(10) pushes the value 10 to the top of the stack. The instruction ArraySet(0) pops the literal value off the stack, and pops the array reference off the stack (which was previously pushed by LoadReg('scope')) and then assigns the 0th slot in the array.

Nested scopes

What if there are nested lexical scopes, as in the following JavaScript code:

function foo() {
  let x;
  function bar() {
    let y;
    function baz() {
      let z;
      x = 10;
      y = 20;
      z = 30;
    }    
  }
}

In the above code, z is accessed in the same scope in which it is declared, as before. But x and y are accessed from parent (outer) scopes relative to the expression that accesses it. So we need a way for the IL to access parent scopes.

Remember that the inner scopes can be instantiated multiple times for each single instantiation of an outer scope. For example, if bar() is called twice within foo, then there will be 2 instances of variable y for every one instance of variable x. So we can’t just put x, y, and z in the same array. We need each lexical scope to be in its own array, and we need a way to reference an outer scope from an inner scope.

This might seem like we need to abandon the fixed-length array as the underlying storage for these variables, since these don’t naturally form chains. But if we think about it a moment, we could reserve one of the slots in the fixed-length array as a pointer to the parent fixed-length array. Let’s mentally reserve the first slot in each array as the pointer to its parent scope.

In the above JS, we have 3 distinct lexical scopes, and so we may land up with a scope chain as follows:

Now, the IL to read variable x may look as follows:

LoadReg('scope')  // Get the current scope (the one containing z)
ArrayGet(0)       // Read the parent scope (the one containing y)
ArrayGet(0)       // Read the parent scope (the one containing x)
ArrayGet(1)       // Read variable x

Each of these instructions encodes to 1 byte, so this is a 4-byte sequence in total.

I thought this was a pretty good solution. It didn’t add any extra instructions to the IL instruction set, and so didn’t make the engine any bigger. An important goal in Microvium is to keep the engine small.

The final solution

In the end, I decided that closures were too important to have it cost so many instructions to read and write to them. Rather than emitting multiple IL instructions just to read or write a single variable, it seemed quite logical to bake this behavior into the engine itself, and add two additional instructions to the IL instruction set: LoadScoped and StoreScoped. I wanted to keep these instructions both compact and efficient, and that’s where the design challenge is interesting.

Consider the following JavaScript with a few more variables to make the pattern clear:

function foo() {
  let a;
  let b;
  function bar() {
    let c;
    let d;
    function baz() {
      let e;
      let f;
      a = 10;
      b = 20;
      c = 30;
      d = 40;
      e = 50;
      f = 60;
    }
  }
}

The assignment statements in this example, such as a = 10, are accessing the variables in one of 3 different lexical scopes. How can we design the instruction format for LoadScoped (and StoreScoped) so that the bytecode can specify which scope is being accessed as well as which variable in that scope?

The eureka moment for me was when I realized that I could use a single mapped index that specifies both the scope and the variable within that scope. I’ll represent this mapping in the following diagram, with the index on the left and the variable it maps to on the right (the allocation headers are omitted here for clarity).

The scopes in this design form a waterfall. Small indexes are accessing the inner-most lexical scope. Larger indexes “overflow” into the next outer closure scope, repeating until the variable is found.

So, the IL LoadScope(1) would load variable e, and LoadScope(8) would load variable b, for example. We can determine the indexes through static analysis — numbering the variables in the closest scope first and then the ones in the next closest, etc2.

Implementation in C

My original concern with adding new bytecode instructions was that it would make the engine much bigger and more complicated. But this design can be implemented efficiently in the engine, adding only a little bit more complexity. See the following C code for finding the variable with index index3:

uint16_t* arr = registers->scope;
do {
  // The length of the array is in its header word
  uint16_t len = arr[-1] & 0xFFF;

  // Is the variable in this scope?
  if (index < len) return arr[index];

  // Otherwise, cascade/overflow to the outer scope
  arr = arr[0];
  index -= len;
} while (1);

Actual Instruction format

For the curious, the actual bytecode instruction format for LoadScoped in its simplest form is just 8 bits, where the upper nibble is the opcode (“LoadScoped”) and the lower nibble is the 4-bit index, allowing up to 15 closure variables to be addressed.

StoreScoped is the same, but with a different opcode.

For the even-more-curious, if your code has more variables than will fit in the 4-bit index, there are a16-bit and 24-bit4 instruction variants as well, as per the following snippet of the Microvium technical documentation:

Conclusion

I’m very satisfied with the final design so far. It brings closures up as first-class citizens in Microvium by making closure variable access almost as space-efficient as local variables, in terms of both bytecode space and memory usage, while also having only a small CPU cost. I think this is important because closures are very common in real-world JavaScript code, especially when programming with a functional style, and also because closures may in future form the foundation for the implementation of other features such as generators and async-await.

This has been a good reminder to me that it’s worth thinking deeply about designs and considering different options, rather than jumping straight in and implementing whatever comes to mind first. In this case, the final design was both simpler to implement and more efficient.

There’s a lot I haven’t talked about in this post, such as how the array is created in the first place and how the scope register changes as control moves between different lexical scopes. But that can be for another time.


  1. If you have a better name for a “closure variable”, please let me know 

  2. Note that one complexity with this design is that the index for a single variable is different depending on which scope the code is accessing it from. The index is not an absolute property of the variable itself. 

  3. This is not the actual code. The actual code needs to deal with error checking and multiple address spaces, for example. 

  4. Why would anyone ever need to address more than 255 closure-scoped variables? I heard a story that the C# compiler assumed that nobody would ever need more than 65536 variables in a single function, but that assumption was violated by a code generator that was generating massive functions. So I’m cautious about saying “nobody will ever need more than 255 variables” when it’s only 2 lines of code to support it. Maybe I’ll change my mind in the future.