Microvium Async – Part 2
Callback-based async-await

Microvium Async – Part 2
Callback-based async-await

TL;DR: A suspended async function in Microvium can take as little as 6 bytes of RAM by using continuation-passing style (CPS) instead of promises, using dramatically less memory than other engines.

This is part 2 of a 4-part series on async-await in the Microvium JavaScript engine:

In the previous post, I introduced the concept of async-await for embedded systems and mentioned how I measured a minimal async function in Node.js to take about 420 bytes of RAM in its suspended state (that’s a lot!). In this post, I’ll cover some of the design decisions that lead to async functions taking as little as 6 bytes of RAM in Microvium, making them much more practical to use within the tiny memory constraints that the Microvium engine is targeting.

Closures in Microvium

Microvium async-await is built on top of closures, so let’s first recap how closures work in Microvium.

Let’s say we have the following JavaScript code:

function makeCounter() {
  let x = 0;

  function incCounter() {
    return ++x;

  return incCounter;

const myCounter = makeCounter();
myCounter(); // returns 1
myCounter(); // returns 2
myCounter(); // returns 3

Here, the myCounter function is a closure, meaning that it’s a heap-allocated function value that embeds within it the state of the variable x. This example looks like this in memory:

When you call the myCounter closure, the engine executes the function bytecode that the closure points to, but the bytecode also has the ability to read and write the variables in the closure such as x.

The key takeaway to remember is this: a closure in Microvium is a heap-allocated type which is callable and embeds within it the state of the captured local variables as well as a pointer to some bytecode to run. When a closure is called, the engine in some sense “mounts” the closure itself as the current scope, so that bytecode instructions can read and write to the slots of the closure, including the variables and even the bytecode pointer itself (which is always the first slot of the closure).

For a more detailed explanation of closures in Microvium, see Microvium closures are tinyMicrovium closures can use less RAM than in C, and Microvium closure variable indexing.

Async-await on closures

In the previous post, we saw that a suspended async function in Microvium can be resumed simply by calling its continuation. Such a continuation is a function that you can call with arguments (isSuccess, result) that will resume the corresponding suspended async function with the result argument being used as the value of the currently-blocked await expression (or the exception to throw). As pointed out in the previous post, you can get fairly direct access to these continuations through the Microvium C API1, so that a C host2 function can call the async continuation when it finishes its asynchronous work.

Under the hood, a continuation is actually just represented in memory as a normal closure, but the bytecode pointer of the closure doesn’t point to the start of the function but somewhere in the middle. So, when called, the closure won’t execute from the beginning of the function but just continue where it left off. The continuation closure contains within it all the local state of the async function, so it also doubles as a kind of heap-allocated “stack frame”.

Let’s take a look at an example. Let’s say we have three async functions, foobar, and baz, each calling the next and awaiting it:

async function foo() {
  await bar();
  console.log('after bar');

async function bar() {
  await baz();
  console.log('after baz');

async function baz() {
  await qux();
  console.log('after qux');

When all 3 async functions are suspended (waiting for qux to finish), the heap memory structure in the Microvium VM looks like this, with 3 closures (the arrows here represent pointers):

Each continuation closure has a resume slot that points to the await point where control is currently suspended (in the bytecode). By definition, the first slot of a closure is the code that gets invoked when the closure is called.

Each continuation also has a callback slot that points to the caller’s continuation, which it will invoke when the async operation is done. For example, when bar is finished, it will automatically call foo continuation.

It’s worth emphasizing that foo continuation in the above diagram is not the same as the function foo. Each time foo is called, the engine allocates a new foo continuation, which doubles as both its continuation closure and a kind of heap-allocated stack frame since it embeds all the local variables of foo.

Each of the continuations here are 6 bytes in this example, because a memory slot in Microvium is 2 bytes and every heap allocation also has an 2-byte allocation header (header not shown in the diagram). An additional 2 bytes would be required for every local variable. For example, let’s say that bar has a local variable x:

Here, I haven’t shown the implementation of qux, but it could be another async function or it could be implemented in the host as a C function. In the latter case, the C qux function would call mvm_asyncStart which would essentially return the baz continuation continuation as a JavaScript function. I showed an example like this in the previous post and go into more detail in the next post.

Continuation-passing style (CPS)

This form of asynchrony, where callbacks are passed to a callee to invoke when it completes, is called continuation-passing style (CPS). See Wikipedia. Even though the source code doesn’t explicitly pass any callbacks, the engine is implicitly creating these callbacks and passing them behind the scenes using a hidden VM register. When a host function calls mvm_asyncStart, it’s essentially requesting explicit access to the automatically-generated continuation of the caller. This will be discussed in more detail in the next post.

The resume point is mutable

In a normal closure that represents a local JavaScript function like an arrow function, the bytecode pointer slot (the first slot) will never change. But in an async continuation, the bytecode pointer (resume point) is considered to be mutable. It points to wherever the async function is currently awaiting, so that the same continuation can be reused to continue the async function at different places throughout its execution.

For interest, if you’re following the analogy between between async continuations and traditional stack frames, you can think of the resume slot as being analogous to the program counter that is pushed to stack to remember where to continue executing when the frame is reactivated. The callback slot in this case is analogous to the pushed stack base pointer register (like EBP in x86), because it saves information about the location of the caller’s frame.

It’s not that simple

In this post, I’ve tried to express the design as simply as possible by eliding some of the things that make async-await complicated to implement. Here are some things I’ll just mention to give you a sense of it, but won’t go into detail:

  • Calculating the continuation size – a lot of static analysis goes into calculating the size of each async function closure, so that it’s big enough to include all the required state at any await point.
  • Exception handling – exceptions need to be propagated up the async call chain, and user-defined catch blocks need to be properly suspended and restored over await points.
  • Temporaries – in an expression like await foo() + await bar(), the engine needs to keep the result of the first await while it is waiting for the second await, so that it can add them together. This state also needs to be kept in the continuation.
  • Nested closures – if you have nested arrow functions inside an async function, they need to be able to access the variables in the continuation closure. When combined with async temporaries and Microvium’s waterfall closure indexing, the static analysis for this turns out to be quite a complicated problem.
  • The job queue – in JavaScript, async functions are resumed on the promise job queue, so there is some work behind the scenes to schedule the continuation on the job queue when you call it, rather than executing it directly (the topic of a later post).
  • Multiple entry points – the bytecode of async functions in Microvium is the first structure to allow pointers into the middle of the structure instead of just the beginning like most allocations, which comes with its own issues since now the same allocation in bytecode memory can have multiple distinct pointers referencing it. Pointer alignment constraints also get in the way here.

But, isn’t JavaScript Promise-based?

If you’re already familiar with JavaScript’s async-await, you might be surprised by the fact that I haven’t mentioned Promises at all in this post so far! Async functions in JavaScript are defined to return Promise objects, so how can I show a memory footprint for 3 awaiting async functions that doesn’t include a Promise of any sort?

Microvium indeed doesn’t create any promises for the the examples shown in this post. In these examples, the promises are immediately awaited or discarded and so not observable to the user code, so Microvium can elide them without breaking ECMA-262 spec compliance.

One might think of this as optimizing a special case, but there’s a better way of looking at it. In Microvium, CPS is the core of async-await: every time you have an async function, you’re creating one of these continuations. When promises are created, they’re merely wrappers or mediators for the CPS continuations. In the next post, I’ll go into more detail about how promises work.

As far as I’m aware, this approach is completely novel and sets Microvium apart from all other JavaScript engines today. This approach allows Microvium async functions to be more than an order of magnitude more memory-efficient than most other JavaScript engines today.

More importantly, this level of memory consumption is acceptable to me when it comes to small embedded devices. If you only have 16 kB of RAM, using 100 bytes or more for a single suspended async function is ridiculous, so as a firmware engineer I would almost never use the feature and instead might spend the extra time hand-crafting state machines. Microvium’s memory efficiency tips the scales completely, making async-await preferable to many other forms of expressing the same logic in either JavaScript or C, provided you’re not dealing with a CPU-bound problem.

P.S. I don’t see a fundamental reason why this approach can’t be used in any JavaScript engine. If you’re reading this and you’re on the dev team of another JS engine and you want to know more about what I’ve done here, feel free to reach out. And if you use this idea, please credit me for the idea or inspiration.

  1. I use the term “fairly direct” because C actually has access to a wrapper function that provides some safety and guarantees, such as enforcing that the continuation cannot be invoked multiple times. There’ll be more detail on this in the next post. 

  2. The term “host” refers the outer C program, such as the firmware, and the term “guest” refers to the inner JavaScript program. 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.