Category: Microvium

Microvium Async – Part 4
The job queue

Microvium Async – Part 4
The job queue

TL;DR: the job queue in Microvium is used just for executing async continuations. It uses only 2 bytes of idle memory and doesn’t require allocating any threads or special support from the host of any kind.


This is the last of a 4-part series on async-await in the Microvium JavaScript engine:

In the previous posts, I talked about how async-await is useful in embedded systems, how it uses continuation-passing style (CPS) at its core to achieve a tiny memory footprint, and how it creates promises when needed to interface between its native CPS protocol and JavaScript’s standard promise protocol. In this post, I’ll talk about the design choices behind the job queue in Microvium.

What is a job queue?

JavaScript is fundamentally single-threaded1, but job queues allow some level of multi-tasking without multithreading, by allowing work to be broken up into small chunks that get executed sequentially.

If you’re an embedded programmer, you may be familiar with the so-called “super-loop” architecture, where a single main function has an infinite loop and will check various things to see what work needs to be done on each cycle of the loop. Often these checks are hard-coded, but an alternative design that’s quite scalable is to have a queue of work that needs to be done, and the main loop just pull work off the queue and performs it.

In Microvium, the job queue is used solely to execute async continuations (which includes all promise subscribers). If foo is awaiting bar, then when bar completes, a job will be scheduled to continue foo where it left off2.

This behavior is required for Microvium to be ECMAScript spec compliant, but it also has the advantage of using less peak stack memory, since a cascade of continuations will each be executed in a fresh job when the call stack is empty, rather than executing on top of each other like a cascade of traditional callbacks.

The Microvium job queue is invisible to the host

An important factor in the design of Microvium is to make it as easy as possible to integrate into host firmware. One thing I was worried about with the introduction of a job queue is that it would complicate the C integration API, because it could require some way to pump the queue (to execute enqueued jobs). Either this might appear like an explicit API call to Microvium, like mvm_processJobs(), or it might require additional hooks in the porting layer to allow Microvium to hook into the operating system to set up a job thread, or hook into a global super-loop, etc.

None of these options are acceptable to me. I want the “getting started” process to be as seamless as possible for newcomers, and forcing newcomers to do this integration work is just too complicated in my opinion. It would require conveying new concepts like “what is a job queue?” and “when should or shouldn’t I pump the job queue?”, just to get started with “hello world”.

The solution I went with in the end, is that Microvium automatically pumps the job queue at the end of mvm_call. That is, when the host (C code) calls a JavaScript function (using mvm_call), if the JavaScript function creates any new jobs, those jobs will be executed before control returns to the host. The VM waits until the guest (JavaScript code) call-stack is empty, but before returning to the host, and then pumps the job queue until all jobs have been completed.

This approach makes it similar to explicit callback-based asynchrony, where callbacks would be executed inline and therefore before control returns to the host. The difference is that the continuations are executed a little bit later before returning to the host. The amount of time that mvm_call blocks the host for doesn’t change between using async-await and using explicit callbacks.

To put it another way, the program may be sitting idle, waiting for something to happen, such as a timer, I/O event, or user interaction. When something happens, the host might mvm_call the guest to let it know about the event. The way that Microvium is designed, the guest will fully process the event and reach its next equilibrium state before mvm_call returns to the host.

Job queue structure

How does the job queue look under the hood?

I’m very hesitant to add things to Microvium that use more memory permanently. So the job queue is implemented as a single VM register, which is just a 2-byte slot. It has multiple states:

  1. Empty: there are no jobs.
  2. Single job
  3. Multiple jobs

Single job

When there’s only a single job enqueued, the jobs register simply points to the job. A job is any function, but in particular it’s almost always going to be a closure since a closure is a function type that can encapsulate some dynamic state (async continuations being a special case of closures). The job queue mechanism doesn’t care what the state is, but since this is a job queue for async continuations, the job closure is almost always going to capture the following state:

  1. The continuation to execute.
  2. Whether the awaited operation passed or failed (isSuccess).
  3. The result or error value (result).

This is all the state required for the job to invoke the continuation with (isSuccess, result), as required by the CPS protocol I defined in the previous post.

Multiple jobs

The vast majority of the time, there will only be zero or one jobs in the job queue. This is because it’s relatively rare for multiple async functions to be awaiting the same promise. But Microvium does need to cater for the case of multiple jobs. When this happens, the jobs register won’t point to a single job, but to a linked list of jobs.

The job pointed to by the register is the first job to execute, but there is an interesting design choice here where the prev pointer of the first node points to the last node, making the linked list into a cycle.

This is simply to facilitate appending to the job queue in O(1) time when adding new jobs, because we can access the last node directly through the first node.

Normally you can append to a doubly-linked list in O(1) time by maintaining a separate pointer to the last node, but I didn’t want to permanently dedicate a whole VM register for the relatively-rare and short-lived case where multiple jobs are waiting. Representing the jobs in a cycle is a neat way to get the best of both.

Note that jobs are removed from the cycle before they’re executed, so the cycle does not imply that jobs are executed multiple times. It’s merely an optimization that allows Microvium to access both the beginning and end of the queue via a single register in O(1) time.

A special case is where the cycle consists of only one job. That is, when next and prev pointers of the first (and only) job point to the job itself. This case occurs when there were multiple jobs queued, but all except one have already been executed. The logic for this case surprisingly “just works”, meaning that you can easily insert a new job between the last and the first nodes even when the last and the first nodes are actually the same node.

A quick note about GC overhead. The diagram above with 12 jobs in the queue has 24 memory allocations. All 12 jobs are guaranteed to be executed before control returns to the host3, so they can be considered to be short-lived (allocated and freed within a single call to the VM). In a language like C, doing a malloc for every job would be a lot of overhead, especially when jobs are as small as “run the next piece of this async function”. But in Microvium, this overhead isn’t too bad because of the characteristics of the Microvium garbage-collected heap:

  1. Allocation headers are small — only 2 bytes per allocation.
  2. Unlike with malloc, there is no memory fragmentation because the heap is compacted.
  3. There is no overhead for collecting garbage memory4. Rather, Microvium occurs collection overhead on everything that isn’t garbage. Even hundreds of tiny temporary allocations can be swept away in one fell swoop.
  4. Allocation time is O(1) because the Microvium heap is compacted. Creating new allocations is roughly analogous to just bumping a “free pointer” forward by N bytes5.

Conclusion

The job queue is one of the less complicated parts of the implementation of async/await, but still important. In the end, I’m quite happy with how the design turned out, requiring only 2 bytes of memory (the register itself) when the feature isn’t being used, being lean in the hot path where only one job is enqueued and executed at a time, but still allowing an unlimited number of jobs while keeping O(1) insertion and removal. And this was achieved without impacting the interface between the host and the VM at all, so users of Microvium don’t have any additional integration work.


  1. There are some multi-threaded capabilities in modern JavaScript, but it’s still single threaded at heart and Microvium doesn’t support any multithreading primitives. 

  2. In the previous posts, I may have used the shorthand of saying that the bar “calls” foo’s continuation. But now you know that it’s more accurate to say that bar *schedules* foo’s continuation to be called on the job queue. 

  3. In the case of reentrant calls, the all jobs will be executed before control returns to the host from the outer-most mvm_call. 

  4. Garbage memory does create memory pressure which indirectly has overhead by increasing the chance of a collection cycle. But during a collection cycle, the GC moves all living allocations to a new region and then frees the old region in one go, so the individual dead allocations do not contribute at all to the amount of time it takes to perform a GC collection. 

  5. It’s not quite as efficient as just bumping a free pointer, since it needs to check if the heap actually has enough space, and potentially expand the heap if there isn’t. 

Microvium async – Part 3
Making promises

Microvium async – Part 3
Making promises

TL;DR: Microvium’s async/await uses continuation-passing style (CPS) at its core for efficiency, but automatically creates promises as well when required. It does so by defining a handshake protocol between the caller and callee to establish when promises are required. Promises in Microvium also pretty compact, but not as compact as raw CPS.


This is part 3 of a 4-part series on async-await in the Microvium JavaScript engine:

In the previous posts, I talked about why async-await is useful, and how a suspended function in Microvium can be as small as 6 bytes by using continuation-passing style (CPS) rather than promises at its core. In this post, I’ll talk more about promises and how they interact with Microvium’s CPS core.

What is a Promise?

Async functions in JavaScript return promises, which are objects that represent the future return value. This post will mostly assume that you’re already familiar with the concept of a Promise in JavaScript. Take a look at MDN for a more detailed explanation.

Microvium doesn’t support the full ECMAScript spec of Promise objects. Like everything in Microvium, it supports a useful subset of the spec:

  • Promises are objects which are instances of the Promise class and inherit from Promise.prototype.
  • You can manually construct promises with new Promise(...).
  • You can await promises.
  • Async functions return promises, if you observe the result (more on that later).
  • Async host1 functions will also automatically return promises (more on that later).

Notably, Microvium promises don’t have a then or catch method, but you could implement these yourself in user code by adding a function to Promise.prototype which is an async function that awaits the given promise and calls the handlers. Microvium also doesn’t have built-in functions like Promise.all, but these can similarly be implemented in user code if you need them. The general philosophy of Microvium has been to keep it small by omitting things that can be added in user code, since that gives the user control over the space trade-off.

It’s interesting to note then that Microvium doesn’t support thenables. Firstly, promises do not have a then method out of the box. Secondly, you cannot await something that isn’t a promise (e.g. a different object which happens to have a then method).

Memory structure

The memory structure of a promise object is as follows, with 4 slots:

The next and __proto__ slots are common to all objects, and I discuss these more in Microvium has Classes.

The status slot is an enumeration indicating whether the promise is pending, resolved, or rejected.

To squeeze the size down as small as possible, the out slot is overloaded and can be any of the following:

  • A single subscriber (the slot points to a closure)
  • A list of subscribers (the slot points to an array)
  • No subscribers (the slot is empty)
  • The resolved value (if the promise is resolved)
  • The error value (if the promise is rejected)

With this design, a promise requires exactly 10 bytes of memory (4 slots plus the allocation header), which isn’t too bad. To put this in context by comparison, a single slot (e.g. a single variable) in the XS JavaScript engine is already 16 bytes.

An interesting thing to note is that there is no separate resolve and reject list of subscribers, and instead just one list of subscribers. My early designs of promises had separate resolve and reject subscribers, because this seemed natural given that JavaScript naturally has separate then and catch handlers. But after several iterations, I realized that it’s significantly more memory efficient to combine these. So now, a subscriber is defined as a function which is called with arguments (isSuccess, result). You may notice this is exactly the same signature as a CPS continuation function as I defined in the previous post, meaning a continuation can be directly subscribed to a promise.

Await-calling

So, we’ve discussed how Microvium’s async-await uses CPS under the hood (the previous post) and how promises look, but there’s a missing piece of the puzzle: how do promises interact with CPS? Before getting into the details, I need to lay some groundwork.

The most common way that you use an async function in JavaScript is to call it from another async function and await the result. For example:

const x = await myAsyncFunction(someArgs);

This syntactic form, where the result of a function call is immediately awaited, is what I call an await-call, and I’ll use this terminology in the rest of the post. Await-calling is the most efficient way of calling an async function in Microvium because the resulting Promise is not observable to the user code and is completely elided in favor of using the CPS protocol entirely.

CPS Protocol

As covered in the previous post, Microvium uses CPS under the hood as the foundation for async-await (see wikipedia’s Continuation-passing style). I’ve created what I call the “Microvium CPS protocol” as a handshake between a caller and callee to try negotiate the passing of a continuation callback. The handshake works as follows.

An async caller’s side of the the handshake:

  1. If a caller await-calls a callee, it will pass the caller’s continuation to the callee in a special callback VM register, to say “hey, I support CPS, so please call this callback when you’re done, if you support CPS as well”.
  2. When control returns back to the caller, the returned value is either a Promise or an elided promise2. The latter is a special sentinel value representing the absence of a promise. If it’s a promise, the caller will subscribe its continuation to the promise. If the promise is elided, it signals that the callee accepted the callback already and will call it when it’s finished, so there’s nothing else to do.

An async callee’s side of the handshake:

  1. An async callee is either called with a CPS callback or it isn’t (depending on how it was called). If there is a callback, the callee will remember it and invoke it later when it’s finished the async operation. The synchronous return value to the caller will be an elided promise to say “thanks for calling me; just to let you know that I also support CPS so I’ll call your callback when I’m done”.
  2. If no callback was passed, the engine synthesizes a Promise which it returns to the caller. When the async callee finishes running, it will invoke the promise’s subscribers.

These callbacks are defined such that you call them with (isSuccess, result) when the callee is finished the async operation. For example callback(true, 42) to resolve to 42, or callback(false, new Error(...)) to reject to an error.

If both the caller and callee support CPS, this handshake completely elides the construction of any promises. This is the case covered in the previous post.

But this post is about promises! So let’s work through some of the cases where the promises aren’t elided.

Observing the result of an async function

Let’s take the code example from the previous post but say that instead of foo directly awaiting the call to bar, it stores the result in a promise, and then awaits the promise, as follows:

async function foo() {
  const promise = bar();
  await promise;
}

async function bar() {
  let x = 42;
  await baz();
}

Note: Like last time, the variable x here isn’t used but is just to show where variables would go in memory.

The key thing here is that we’re intentionally breaking CPS by making the promise observable, so we can see what happens.

The memory structure while bar is awaiting will look like this:

The memory structure looks quite similar to that showed in the previous post, but now with a promise sitting between foo continuation and bar continuation. Foo’s continuation is subscribed to the promise (foo will continue when the promise settles), and bar‘s “callback” is the promise. A promise is not a callable object, so the term “callback” is not quite correct here, but when bar completes, the engine will call of the subscribers of the promise. (Or more accurately, it will enqueue all of the subscribers in the job queue, which is the topic of the next post.)

This structure comes about because when bar is called, it will notice that it wasn’t provided with a callback (because the call was not an await-call) and so it will create the promise. The await promise statement also isn’t an await-call (it’s not a call at all), but since the awaitee is a promise, foo will subscribe its continuation to that promise.

The end result here is that we’ve introduced another 10 bytes of memory overhead and inefficiency by making the promise observable, but we’ve gained some level of flexibility because we pass the promise around and potentially have multiple subscribers.

A host async function

We can gain some more insight into what’s happening here if we consider the case where bar is actually a host function implemented in C rather than JavaScript. I gave an example of this in the first post in this series. Since you’ve made it this far in the series, let’s also make this example a little more complete, using an mvm_Handle to correctly anchor the global callback variable to the GC.

mvm_Handle globalCallback;

mvm_TeError bar(
  mvm_VM* vm,
  mvm_HostFunctionID hostFunctionID,
  mvm_Value* pResult, // Synchronous return value
  mvm_Value* pArgs,
  uint8_t argCount
) {
  // Get a callback to call when the async operation is complete
  mvm_Value callback = mvm_asyncStart(vm, pResult);

  // Let's save the callback for later
  mvm_handleSet(&globalCallback, callback);

  /* ... */return MVM_E_SUCCESS;
}

An async host function is just a normal host function but which calls mvm_asyncStart. The function mvm_asyncStart encapsulates all the logic required for the callee side of the CPS handshake:

  1. If the caller await-called bar, it will have passed a callback, which mvm_asyncStart will return as the callback variable. In this case, it will set *pResult to be an elided promise, so that the caller knows we accepted the callback.
  2. Otherwise, mvm_asyncStart will set *pResult to be a new Promise, and will return a callback closure which settles that Promise (resolves or rejects it).

In this case, foo didn’t await-call bar, so this promise will be created and the returned callback will be a closure that encapsulates the logic to resolve or reject the promise:

I think there’s a beautiful elegance here in that mvm_asyncStart accepts as an argument a writeable reference to the synchronous return value (as a pointer) and returns essentially a writable reference to the asynchronous return value (as a callback).

One of the big design goals of the Microvium C API is to be easy to use, and I think this design of mvm_asyncStart really achieves this. In other JavaScript engines, an async host function would typically have to explicitly create a promise object and return it, and then later explicitly resolve or reject the promise, which is not easy. Microvium not only makes it easy, but also allows the engine to elide the promise altogether.

Side note: if foo did directly await-call bar, the promise would not be created, but the aforementioned callback closure would still exist as a safety and convenience layer between the host function and foo‘s continuation. It serves as a convenient, unified way for the host to tell the engine when the async operation is complete, and it encapsulates the complexity of scheduling the continuation on the job queue, as well as providing a safety layer in case the host accidentally calls the callback multiple times or with the wrong arguments.

Using the Promise constructor

The last, and least memory-efficient way to use async-await in Microvium, is to manually create promises using the Promise constructor, as in the following example:

async function foo() {
  const promise = bar();
  await promise;
}

function bar() {
  return new Promise((resolve, reject) => {
    let x = 42;
    // ...
  });
}

Syntactically, this example looks pretty simple. But there are a lot of implicit objects created here:

  • The Promise object itself.
  • The resolve closure to resolve the promise.
  • The reject closure to reject the promise.
  • The executor closure (the arrow function passed to the Promise constructor in the above code) which captures both the resolve and reject closures.

So, while this looks syntactically like the lowest-level use of promises, it’s actually the most complicated behind the scenes. The suspended form of the “async” operation bar here has ballooned from the 8 bytes shown in the previous post to now 32 bytes!

Conclusion

Microvium async at its core uses a CPS protocol to maximize memory efficiency, requiring as little as 6 bytes for a suspended async function (and 2 additional bytes per variable), but at the boundaries between pure CPS and promise-based async, the introduction of promises and closures as protocol adapters brings additional overhead, with the worst case being where you create a promise manually.

The CPS handshake allows Microvium to dynamically decide when promises are required. The careful design of mvm_asyncStart allows even native host functions to participate in this handshake without having to worry about the details. This is important because async JS code needs to await something, and at some point the stack of awaits will ideally bottom-out at a natively-async host API. Microvium’s unique design allows the whole await stack to be defined in terms of pure CPS at least some of the time, without a single promise — turtles all the way down.

Even in the worst case, async/await in Microvium is still much more memory-efficient than other JavaScript engines. Engines like Elk, mJS, and Espruino, don’t support async-await at all — I’m not aware of any engine that even comes close to the size of Microvium which supports async-await. I haven’t measured the size of async-await and promises in XS, but bear in mind that a single slot in XS is already 16 bytes, and even a single closure in XS may take as much as 112 bytes. In V8 on x64, I measure a suspended async function to take about 420 bytes.

Of course, be aware that Microvium doesn’t support the full spec, so it’s not an apples-to-apples comparison. But however you look at it, Microvium’s design of async-await makes it feasible to use it on a completely new class of embedded devices where it was not possible before.


  1. The term “host” refers the outer C program, such as the firmware, and the term “guest” refers to the inner JavaScript program. 

  2. Microvium doesn’t support using await on a value isn’t a promise or elided promise. It will just produce an error code in that case. This is part of the general philosophy of Microvium to support a useful subset of the spec. Here, awaiting a non-promise is likely a mistake. 

Microvium Async – Part 2
Callback-based async-await

Microvium Async – Part 2
Callback-based async-await

TL;DR: A suspended async function in Microvium can take as little as 6 bytes of RAM by using continuation-passing style (CPS) instead of promises, using dramatically less memory than other engines.


This is part 2 of a 4-part series on async-await in the Microvium JavaScript engine:

In the previous post, I introduced the concept of async-await for embedded systems and mentioned how I measured a minimal async function in Node.js to take about 420 bytes of RAM in its suspended state (that’s a lot!). In this post, I’ll cover some of the design decisions that lead to async functions taking as little as 6 bytes of RAM in Microvium, making them much more practical to use within the tiny memory constraints that the Microvium engine is targeting.

Closures in Microvium

Microvium async-await is built on top of closures, so let’s first recap how closures work in Microvium.

Let’s say we have the following JavaScript code:

function makeCounter() {
  let x = 0;

  function incCounter() {
    return ++x;
  }

  return incCounter;
}

const myCounter = makeCounter();
myCounter(); // returns 1
myCounter(); // returns 2
myCounter(); // returns 3

Here, the myCounter function is a closure, meaning that it’s a heap-allocated function value that embeds within it the state of the variable x. This example looks like this in memory:

When you call the myCounter closure, the engine executes the function bytecode that the closure points to, but the bytecode also has the ability to read and write the variables in the closure such as x.

The key takeaway to remember is this: a closure in Microvium is a heap-allocated type which is callable and embeds within it the state of the captured local variables as well as a pointer to some bytecode to run. When a closure is called, the engine in some sense “mounts” the closure itself as the current scope, so that bytecode instructions can read and write to the slots of the closure, including the variables and even the bytecode pointer itself (which is always the first slot of the closure).

For a more detailed explanation of closures in Microvium, see Microvium closures are tinyMicrovium closures can use less RAM than in C, and Microvium closure variable indexing.

Async-await on closures

In the previous post, we saw that a suspended async function in Microvium can be resumed simply by calling its continuation. Such a continuation is a function that you can call with arguments (isSuccess, result) that will resume the corresponding suspended async function with the result argument being used as the value of the currently-blocked await expression (or the exception to throw). As pointed out in the previous post, you can get fairly direct access to these continuations through the Microvium C API1, so that a C host2 function can call the async continuation when it finishes its asynchronous work.

Under the hood, a continuation is actually just represented in memory as a normal closure, but the bytecode pointer of the closure doesn’t point to the start of the function but somewhere in the middle. So, when called, the closure won’t execute from the beginning of the function but just continue where it left off. The continuation closure contains within it all the local state of the async function, so it also doubles as a kind of heap-allocated “stack frame”.

Let’s take a look at an example. Let’s say we have three async functions, foobar, and baz, each calling the next and awaiting it:

async function foo() {
  await bar();
  console.log('after bar');
}

async function bar() {
  await baz();
  console.log('after baz');
}

async function baz() {
  await qux();
  console.log('after qux');
}

When all 3 async functions are suspended (waiting for qux to finish), the heap memory structure in the Microvium VM looks like this, with 3 closures (the arrows here represent pointers):

Each continuation closure has a resume slot that points to the await point where control is currently suspended (in the bytecode). By definition, the first slot of a closure is the code that gets invoked when the closure is called.

Each continuation also has a callback slot that points to the caller’s continuation, which it will invoke when the async operation is done. For example, when bar is finished, it will automatically call foo continuation.

It’s worth emphasizing that foo continuation in the above diagram is not the same as the function foo. Each time foo is called, the engine allocates a new foo continuation, which doubles as both its continuation closure and a kind of heap-allocated stack frame since it embeds all the local variables of foo.

Each of the continuations here are 6 bytes in this example, because a memory slot in Microvium is 2 bytes and every heap allocation also has an 2-byte allocation header (header not shown in the diagram). An additional 2 bytes would be required for every local variable. For example, let’s say that bar has a local variable x:

Here, I haven’t shown the implementation of qux, but it could be another async function or it could be implemented in the host as a C function. In the latter case, the C qux function would call mvm_asyncStart which would essentially return the baz continuation continuation as a JavaScript function. I showed an example like this in the previous post and go into more detail in the next post.

Continuation-passing style (CPS)

This form of asynchrony, where callbacks are passed to a callee to invoke when it completes, is called continuation-passing style (CPS). See Wikipedia. Even though the source code doesn’t explicitly pass any callbacks, the engine is implicitly creating these callbacks and passing them behind the scenes using a hidden VM register. When a host function calls mvm_asyncStart, it’s essentially requesting explicit access to the automatically-generated continuation of the caller. This will be discussed in more detail in the next post.

The resume point is mutable

In a normal closure that represents a local JavaScript function like an arrow function, the bytecode pointer slot (the first slot) will never change. But in an async continuation, the bytecode pointer (resume point) is considered to be mutable. It points to wherever the async function is currently awaiting, so that the same continuation can be reused to continue the async function at different places throughout its execution.

For interest, if you’re following the analogy between between async continuations and traditional stack frames, you can think of the resume slot as being analogous to the program counter that is pushed to stack to remember where to continue executing when the frame is reactivated. The callback slot in this case is analogous to the pushed stack base pointer register (like EBP in x86), because it saves information about the location of the caller’s frame.

It’s not that simple

In this post, I’ve tried to express the design as simply as possible by eliding some of the things that make async-await complicated to implement. Here are some things I’ll just mention to give you a sense of it, but won’t go into detail:

  • Calculating the continuation size – a lot of static analysis goes into calculating the size of each async function closure, so that it’s big enough to include all the required state at any await point.
  • Exception handling – exceptions need to be propagated up the async call chain, and user-defined catch blocks need to be properly suspended and restored over await points.
  • Temporaries – in an expression like await foo() + await bar(), the engine needs to keep the result of the first await while it is waiting for the second await, so that it can add them together. This state also needs to be kept in the continuation.
  • Nested closures – if you have nested arrow functions inside an async function, they need to be able to access the variables in the continuation closure. When combined with async temporaries and Microvium’s waterfall closure indexing, the static analysis for this turns out to be quite a complicated problem.
  • The job queue – in JavaScript, async functions are resumed on the promise job queue, so there is some work behind the scenes to schedule the continuation on the job queue when you call it, rather than executing it directly (the topic of a later post).
  • Multiple entry points – the bytecode of async functions in Microvium is the first structure to allow pointers into the middle of the structure instead of just the beginning like most allocations, which comes with its own issues since now the same allocation in bytecode memory can have multiple distinct pointers referencing it. Pointer alignment constraints also get in the way here.

But, isn’t JavaScript Promise-based?

If you’re already familiar with JavaScript’s async-await, you might be surprised by the fact that I haven’t mentioned Promises at all in this post so far! Async functions in JavaScript are defined to return Promise objects, so how can I show a memory footprint for 3 awaiting async functions that doesn’t include a Promise of any sort?

Microvium indeed doesn’t create any promises for the the examples shown in this post. In these examples, the promises are immediately awaited or discarded and so not observable to the user code, so Microvium can elide them without breaking ECMA-262 spec compliance.

One might think of this as optimizing a special case, but there’s a better way of looking at it. In Microvium, CPS is the core of async-await: every time you have an async function, you’re creating one of these continuations. When promises are created, they’re merely wrappers or mediators for the CPS continuations. In the next post, I’ll go into more detail about how promises work.

As far as I’m aware, this approach is completely novel and sets Microvium apart from all other JavaScript engines today. This approach allows Microvium async functions to be more than an order of magnitude more memory-efficient than most other JavaScript engines today.

More importantly, this level of memory consumption is acceptable to me when it comes to small embedded devices. If you only have 16 kB of RAM, using 100 bytes or more for a single suspended async function is ridiculous, so as a firmware engineer I would almost never use the feature and instead might spend the extra time hand-crafting state machines. Microvium’s memory efficiency tips the scales completely, making async-await preferable to many other forms of expressing the same logic in either JavaScript or C, provided you’re not dealing with a CPU-bound problem.

P.S. I don’t see a fundamental reason why this approach can’t be used in any JavaScript engine. If you’re reading this and you’re on the dev team of another JS engine and you want to know more about what I’ve done here, feel free to reach out. And if you use this idea, please credit me for the idea or inspiration.


  1. I use the term “fairly direct” because C actually has access to a wrapper function that provides some safety and guarantees, such as enforcing that the continuation cannot be invoked multiple times. There’ll be more detail on this in the next post. 

  2. The term “host” refers the outer C program, such as the firmware, and the term “guest” refers to the inner JavaScript program. 

Microvium Async – Part 1
Introduction to async-await for embedded systems

Microvium Async – Part 1
Introduction to async-await for embedded systems

TL;DR: The async/await feature of JavaScript is an alternative to multithreading, state machines, or callback-based code for managing long-running operations. Microvium’s design of async-await makes it one of the most memory-efficient and ergonomic ways of writing such code for small devices.


This is the first of a series of posts on the design of async-await in the Microvium JavaScript engine:

The target audience of this post is primarily embedded programmers who may or may not be very familiar with async/await and might be interested in how it could help with embedded software development. In this post, I’ll talk about what it is, how to use it in Microvium, and how it compares to some of the other techniques that solve a similar problem. In later posts, I’ll dive deeper into some of the inner workings.

What is Async/Await?

In the world of programming, there are concepts that fundamentally shift how we write and reason about code. Async/await is one of these game-changing concepts. Primarily seen in languages like JavaScript and C#, the async/await pattern has revolutionized how we deal with asynchronous operations and manage flow control.

Async/await in JavaScript allows individual functions to be suspended and resumed, similar to suspending and resuming a thread but much safer, more ergonomic and memory-efficient.

When you declare a JavaScript function as async, its variables and state will be stored on the heap instead of the stack, so that the function can be suspended mid-execution without blocking other functions on the stack. It gets suspended at await points in the code.

Here’s a hypothetical example async function that describes the sequence of operations one might use to connect to an HTTP server and send a message1:

async function sendMessageToServer(message) {
  console.log('Turning on the modem...');
  await powerOnModem();

  console.log('Connecting to server...');
  const connection = await connectToServer('http://my-server.com');

  console.log('Sending message...');
  await connection.sendMessage(message);

  console.log('Message sent');
  await connection.disconnect();
  await powerOffModem();
}

At each await, the function sendMessageToServer is suspended. The engine transforms the remainder of the function into a callback that can be later invoked to resume execution, such as when powerOnModem completes. This callback is known as a continuation because when called, it will continue the async function where it left off.

Awaiting a C Function

If you’re a C programmer, the concept of async/await may be easiest to understand by looking at how you might implement powerOnModem in C, leveraging the builtin Microvium C function mvm_asyncStart which returns the aforementioned callback:

mvm_Value callback; // Global variable to hold the callback

mvm_TeError powerOnModem(mvm_VM* vm, ...) {
  callback = mvm_asyncStart(vm, ...);
  // ...
}

// Later, invoke the callback when the modem is powered on.
// This will continue `sendMessageToServer`.
mvm_call(vm, callback, ...);

Side note: this is not a complete example. It doesn’t show the glue code to make powerOnModem accessible to the JS code or how to prevent the callback from being freed by the garbage collector. Refer to the Microvium documentation for a more detailed example.

When Microvium encounters the statement await powerOnModem(), it suspends sendMessageToServer on the heap as a continuation callback. The call to mvm_asyncStart in powerOnModem returns this continuation callback so that the C code can later call it2.

The arguments that the callback expects are (isSuccess, result), where isSuccess must be true or false depending on whether the asynchronous operation logically succeeded or failed, and the result must be whatever you want the result of the await to be. In our example, the result of await powerOnModem() isn’t used, so it doesn’t matter. But if you were implementing connectToServer, you would want the result to be the connection, whatever that might be, since we can see that the async function uses the a result as the connection:

const connection = await connectToServer('http://my-server.com');

Awaiting a JavaScript Function

If instead of C, you wanted to implement powerOnModem in JavaScript, you don’t need to call mvm_asyncStart — you instead just declare the function as async, similar to before:

async function powerOnModem() {
  modemPowerPin.setHigh();
  await modemPowerOnDetected();
}

By declaring powerOnModem to be async, Microvium automatically calls the continuation callback of the caller (sendMessageToServer in this case) when powerOnModem finishes executing completely. So the caller resume automatically when the callee finishes.

In a sense, the async-await feature in Microvium transforms your async functions to callback-based functions under the hood. The return from one async function automatically calls the callback of the calling async function.

Await Anywhere

You can use await expressions pretty much anywhere in an async function. Examples include loops, conditionals, function call arguments — await can appear anywhere in an async function where an expression can appear, allowing the function to pause at that point and transforming the remainder of the function into a continuation.

This makes async-await a really convenient alternative to the complicated state machines you might otherwise use for this kind of logic in C. Managing counters, conditional paths, and nesting in a state machine can become a nightmare, but async-await can make these things can trivial.

What if you don’t await?

So far, we’ve seen examples where an async caller is suspended while waiting for an async callee. But that’s not much different to the behavior you get with normal function calls: when you call a function normally, the caller is suspended on the stack until the callee completes. What makes async functions different?

The difference becomes most apparent when consider the alternative: not awaiting. For example, let’s say we call sendMessageToServer without awaiting:

function processMessage(message) {
  sendMessageToServer(message);
  saveMessageToFlash(message);
}

In this hypothetical example, we’re not awaiting sendMessageToServer, so control will move on to saveMessageToFlash before sendMessageToServer has run to completion.

There’s no multithreading here. sendMessageToServer does not continue in the background on a thread. Rather, the call to sendMessageToServer will just return early, when sendMessageToServer reaches its first await point, which in this case is when it’s waiting for the modem to power on. So in this example, once the program is waiting for the modem to power on, it will start to save the message to flash.

This allows you to do event-driven multi-tasking without multithreading and without manually creating state machines or callbacks. Unlike with multithreading, you don’t need to worry about locking and synchronization, which makes the code much simpler and safer.

Memory efficiency

Microvium async functions are very lightweight compared to most other ways of achieving the same objective. Async functions like those shown in this post take just 6 bytes of RAM while they’re awaiting something3, plus an additional 2 bytes for each local variable.

6 bytes is tiny! If you wanted to use multithreading for this instead, the stack you dedicate to it might be hundreds of bytes. For example, on a Cortex-M MCU, this article says that a context switch uses at least 17 to 51 words (68 to 204 bytes) on the stack.

In an RTOS environment, such stacks are typically permanently allocated, whether the corresponding task is busy doing work or not. For example, a task for managing the modem might have a dedicated stack which is nearly empty most of the time while the modem is disconnected or idle, but requires enough space to handle bursts of activity for connecting and sending messages.

Microvium async-await gives the syntactic convenience of a dedicated thread while only using the minimum amount of memory required at any one time. Async frames that are no longer in use are freed by the garbage collector so the memory can be reused by other parts of the program.

I’ll note that the 6-byte figure in Microvium is not a characteristic of JavaScript but of the particular implementation and tradeoffs that Microvium makes. For comparison, async-await in node.js uses about 420 bytes per suspended async function, as measured on my desktop machine4. In Moddable’s XS JavaScript engine, which is designed for resource constrained environments like embedded devices, I measure a suspended async function to take about 950 bytes!

Microvium closures are also tiny, so you’ll get similar memory efficiency if you implement your asynchronous code in Microvium using callbacks, but async/await gives you much better syntactic convenience.

It would even be hard to beat this level of memory efficiency using hand-crafted state machines in C, not to mention being substantially more complicated to implement.

Conclusion

In the world of embedded systems, managing asynchronous operations can be a complex and often cumbersome task. The introduction of the async/await pattern to the Microvium JavaScript engine not only simplifies these asynchronous operations but also brings efficiency and elegance to the code. By allowing functions to be suspended and resumed at specific points, asynchronous code becomes more legible and maintainable.

Microvium’s implementation of async/await takes memory efficiency to a new level, allowing such code to be targeted to much smaller devices than was possible before. The next smallest JavaScript engine that supports async-await is an order of magnitude larger than Microvium5. Even Espruino, which is quite large and well used

Stay tuned for the next episode where I pop the hood and show you how Microvium achieves this 6-byte target size.


  1. All the functions in these examples are made up. Microvium as an engine doesn’t give you any of these functions out-of-the-box. 

  2. This is a simplification. The mvm_asyncStart function actually returns a wrapper for either the continuation or a promise value, and the wrapper provides extra safety as well as executing the continuation or promise subscribers from the job queue rather than synchronously. mvm_asyncStart also manages the wrapping of the async C function in a promise if required. More on that in later posts. 

  3. That’s 6 bytes including a 2-byte allocation header. 

  4. That’s measured on a 64-bit Windows machine 

  5. I’m referring to QuickJS and XS, which both require at least 10x the flash and RAM space of Microvium. 

Microvium on WASM!

Microvium on WASM!

TL;DR: The Microvium runtime, @microvium/runtime on npm, allows you to restore and run Microvium snapshots in a JavaScript host environment such as node.js or the browser, using a WASM build of the Microvium native engine.


Why?

It might seem strange to run a JavaScript engine on a JavaScript engine — why not just run the code directly on the host1 engine?

The answer is that now you can run the same identical Microvium snapshot on a microcontroller, on a server, or in the browser. If your snapshotted app contains business logic, you can now run the business logic in multiple places and have it behave identically.

Among other things, this can allow you to create a unified user experience for your IoT users:

  • Consistent UI workflows whether the user is using the web interface or the device display.
  • Shared validation and business logic.
  • Previewing device behavior on the server before downloading it to the device.

You can also use it for testing and validation of your device logic in a simulated environment before putting the Microvium app onto a real device.

Sandboxing

Apps run on Microvium are completely sandboxed to the host — there is no interaction with the host at all except through the imports and exports that you give it. This allows you to safely run the app in any environment without worrying that it may misbehave.

The library has been designed so that the host can manipulate objects inside the VM, but it’s impossible for the VM to manipulate objects inside the host — anything passed from the host to the VM is passed by copy rather than by reference2.

Gas Counter

Microvium now has a gas counter. For example, the following line of code in the host tells the VM to stop after executing 2000 VM instructions:

vm.stopAfterNInstructions(2000);

Use this feature to catch misbehaving scripts that enter an infinite loop or hog the CPU.

Persistence

The library offers both a way to restore a snapshot to a running VM and to capture a snapshot of a running VM. Among other uses, you could use this for persistence of the VM state in a database, where a VM is restored from the database when there is a relevant event for it to process.

This is particularly easy to do because snapshots in Microvium can be quite small — the hello-world snapshot shown later in this post is only 124 bytes.

Lightweight and Fast

This runtime library is designed to be incredibly lightweight. The full Microvium compiler is hundreds of MB, but this runtime library has a minified size of only 64kB and uses no external npm dependencies.

The startup time for the library itself is about 10-30ms in node.js and it can restore or capture a snapshot of the VM in under 1ms.

The running RAM usage is 256kB per VM, so you can run thousands of them on a typical desktop-class machine without breaking a sweat.

What about the overhead of running on WASM compared to running natively?

Actually, the WASM library runs about 40x faster than the native execution of Microvium on a desktop-class machine with the default configuration. The reason is that Microvium is a 16-bit virtual machine, using 16-bit pointers internally, but a modern machine is 32- or 64-bit, so there needs to be a translation layer between the two. In the default configuration of the Microvium native engine that comes bundled with the compiler, this translation is done in software for maximum portability, which is slow. But in the WASM implementation, a lot of effort was spent to get Microvium to use the first 64kB of the WASM address space as if it was running on a 16-bit device natively, and let the host WASM engine do the translation (which is done in hardware).

Side note: you can get the native implementation to run much faster than the default by deciding a fixed location in memory to reserve for the 64kB of Microvium memory, which is the recommended way to run on 32-bit microcontrollers such as ARM. I haven’t tested the WASM speed against this alternative configuration.

Microvium itself is optimized to be small compared to other engines, and you can take advantage of that here as well. Although your VM can’t exceed 64kB of space, you can do a lot more in 64kB in Microvium than you can in other JavaScript engines. Consider, for example, how closures in Microvium can be 10-15x smaller in Microvium than in some other engines.

Easy to use

I put a lot of effort into making this library really easy to use. Let’s say that you have the following hello-world program you want to run in Microvium:

// my-app.js 
const print = vmImport(1);
vmExport(1, main);

function main() {
  print("Hello, World!")
}

You can compile it with the Microvium compiler using the CLI command:

microvium my-app.js --output-bytes

The output-bytes flag here tells the compiler to output the snapshot as a comma-separated list of bytes on the stdout, like 0x04,0x1c,0x03,0x00,0x48,…..

Now you can use the new @microvium/runtime library to run the snapshot in node.js or the browser, with just a few lines of code:

import Microvium from '@microvium/runtime';

const snapshot = [0x04,0x1c,0x03,0x00,0x48,…];

const imports = { [1]: console.log };
const vm = Microvium.restore(snapshot, imports);
const { [1]: main } = vm.exports;

main();

The imports in the above snippet give the VM access to host functionality, each identified by a numeric ID of your choosing. In this case it exposes the console.log host function which the app imports as print.

The exports is the opposite and gives the host access to things which the app exported with vmExport, also each associated with a numeric ID.

The library handles the translation of the VM data types to the host and vice versa, in a mostly-seamless way that makes the WASM library much easier to use than the raw microvium.h interface for a C or C++ host.

The Journey

(Skip this section if don’t care about the detail of how this library was put together).

I thought this library would be a quick weekend project. After all, the Microvium C interface is relatively small and easy to use, and this is just a wrapper around a WASM build of the C library, right?

But it turned into weeks or maybe months of work in the end. Sure, in a weekend I could probably have compiled microvium.c using Emscripten and done a few wrapper functions for the API. But that’s not up to my standard of usability and performance.

Emscripten is the standard, “easy” way to compile C/C++ to WASM. But it’s heavy and comes with lots of extra stuff that I didn’t need. For example, it comes with its own C runtime library (clib). Microvium compiles to just 10s of kB, but a pre-built runtime library can be MBs. Emscripten also comes with its bootstrapping code for running things at runtime. It’s not a lean solution by any means.

But the main reason I didn’t go with Emscripten is that it gives you no control over the address space of the final output. I was fairly convinced that I would get significant performance improvements if I could run Microvium as a 16-bit engine using exactly the first 64kB of WASM memory (addresses 0 to 0xffff), but there was no way to do this using Emscripten.

Emscripten uses the Clang C/C++ compiler under the hood, so I went with that. Standard Clang allows you to provide a linker script that configures the memory address space for the final result, so that should have made it easy to get Microvium into the first 64kB of address space.

But, this is where the problems started!

While the normal Clang compiler (alternative to GCC) comes with this capability, it seems that the WASM-targetting Clang compiler doesn’t. So I had to hack a solution:

  • By being very careful with the order of linking and declarations, I could allocate a 64kB variable that happens to be compiled to the beginning of the output address space (this could break in future versions of Clang, and then I’ll need to re-visit the solution).
  • But! Clang doesn’t start the output addresses at address zero by default. But eventually I figured out the compiler options to do that.
  • Ideally you should configure this giant 64kB variable as requiring no initialization, so it doesn’t take up space in the compiled WASM binary.
  • But! The clang no-initialization section (.bss) is put after the initialized section (.data) and because there’s no way to give it linker information, you have no control over this. So you have to declare the massive variable as requiring initialization.
  • But! This results in a massive compiled WASM file, with 64kB of zeros in it. So naturally, I need a post-processing step to remove this from the WASM binary so as not to bloat the library.
  • But! To do this requires some WASM tools which don’t come with Clang. So we need to install them.
  • But! They don’t install on Windows, so I need to install them in WSL and use that instead.
  • But! I have other scripts which can only run in Windows. So I have an awful mix of Linux and Windows shell scripts that I need to run in the pipeline.
  • Also, the tools for pulling apart the WASM binary and putting it back together don’t preserve debug symbols! So to do any debugging, I need to use the unprocessed WASM binary.

Clang doesn’t come with a clib, and I looked into finding a lean clib to use, but in the end I decided to write my own. Not the whole thing — only the subset of clib components required for Microvium, which isn’t too much.

This was definitely the right choice. Some of the clib functions could be implemented using WASM intrinsics, and some as wrappers around host JavaScript operations. Writing the whole lot myself was a way to make sure that each one is implemented in the most sensible and efficient way for the WASM build.

Then came the actual interface between the VM and the host. Using a JavaScript host language opens up the door to a much more seamless way of interacting with the VM, with automatic translation between the VM data types and the host data types (which are both representations of JavaScript values). The library uses JavaScript Proxies to allow the host to have references to objects, arrays, functions, and classes inside the VM. It uses reference-counted handles, along with WeakMap and FinalizationRegistry, to manage references across the boundary, tracking them when memory in the VM moves, and freeing them when no-long needed.

There are a surprising number of details to consider. For example, the proxy for a Microvium array needs to support iterators so that the host can use for … of loops on it. Similarly, you need proper enumeration of object keys so that JSON.stringify works (and also object spread {...obj}).

Uint8Array isn’t implemented as a proxy because it would be horribly inefficient to access individual bytes through a proxy layer, but at the same time I didn’t want to pass Uint8Array by-copy because of the inefficiency in the case of large blocks of data where you only need to change a small amount (e.g. implementing binary queuing and buffering using a shared, mutable Uint8Array). So I implemented a custom wrapper class.

A lot of this work required changes to the Microvium engine itself, to allow the WASM wrapper to gain access to internals in such a way as to implement these features efficiently. The C API of Microvium doesn’t support property get/set on objects, it doesn’t support passing this in function calls, it doesn’t support calling new on classes, or enumerating object keys. These things are all more important when the host has first-class support to leveraging these features, as in the case of a JavaScript host.

The way I did this in the end was to mark some Microvium internal functions as MVM_HIDDEN. By default, these are compiled using static linkage, meaning they’re completely invisible to the host, but a host may choose to redefine this macro in the port file to make these select functions visible in the linker. But I still chose to keep them out of microvium.h since the C interface is designed to be as simple as possible for C users.

Usability design is very important to me, and another hurdle I to jump over was how to give users a seamless experience when using WASM. The issue is that, at least today, bundlers do not support WASM. The library needs to load and compile the WASM, but where does it get the WASM binary from, if it’s not baked into the library JavaScript? Does a user need to host the .wasm file at a URL and then give that URL to the library?

That would be a terrible user experience. In the end, I did bake the WASM library directly into the JavaScript source code: I have a pre-build script that converts the .wasm file directly to a base64 string and plugs it straight into the library code. The library (a single, bundled, index.js file) is then distributed to users with this WASM binary baked into it, and it decodes and compiles it at startup. Luckily I spent all the effort earlier to make the binary small, so there’s not much overhead in representing it as base64 or decoding it at startup (the wasm file is 36kB and whole library is 64kB including the base64-encoded wasm).

Conclusion

In the end, I’m happy with the way the library turned out. My favorite kind of complexity is the kind that’s hidden behind a simple and intuitive interface, which I think this library does successfully. The user doesn’t need to think about handle management, memory allocation, or type translation — everything more or less “just works” the way you would intuitively expect it to.

This library opens up so many opportunities for new ways of working Microvium, whether you just want to use it to execute some sandboxed logic, use it for testing Microvium apps, or build a distributed IoT application with a cloud-side component.


  1. Note: the term host here refers to the program running the Microvium engine 

  2. This was actually done because Microvium doesn’t have a proxy type, so it can’t maintain references to host objects. But the copy semantics also has security benefits, since it’s impossible for Microvium code to mutate host objects or use shared host objects as a secret communications channel between VMs. 

Microvium closures are tiny!

Microvium closures are tiny!

TL;DR: Microvium closures are now as small as 6 bytes on the heap, down from 12 bytes previously and compared to over 100 B in some other engines.


Microvium is all about squeezing things into tiny spaces, and the latest of these is closures — the memory structure used to carry the state of nested functions. If you’re new to closures, have a look at my previous post which explains closures from a C programmer’s perspective.

One of my goals with Microvium has been to make it feasible and realistic to run JavaScript on really tiny devices. More than half of the microcontrollers selling on Digikey have less than 16kB of RAM and 128kB of flash. When you’re working with devices this small, you count every byte you use.

People typically program these kinds of devices in C. However, as I’ve shown before, there can be some real, practical benefits to using a language like JavaScript in these situations. For example, JavaScript-style single-threading can be more memory-efficient than multithreading, and using a garbage collector like Microvium’s can avoid heap fragmentation overhead and help to reduce the memory leaks associated with manually freeing memory. And of course, your code may be less complicated when working in a higher-level language, which itself can mean less development time and fewer bugs.

When you’re working with such small memory spaces, what kind of size would you consider to be acceptable for a closure function, like the myCounter in the following example?

function makeCounter() {
  let x = 0;

  function incCounter() {
    return ++x;
  }

  return incCounter;
}

const myCounter = makeCounter();

As I demonstrated in my previous post, if you code up something like this in C, it might take 20 bytes of memory1, ignoring fragmentation overhead. So what kind of size would be acceptable to get the convenience of the JavaScript syntax? Maybe 30 bytes? 50 bytes?

Other JavaScript engines that I’ve measured take over 100 bytes for a closure like this! (See here for my measurement methodology and feel free to correct it). That’s not a fault of the engine, it’s a product of the JavaScript spec which requires that all functions are objects. In the spec, function declarations (functions declared using function rather than arrow syntax) also have a fresh prototype object in case you use the function with new.

That’s a heavy price to pay! Maybe that’s ok on a desktop-class machine. But when you’re working on a device with tens of kilobytes of memory, that’s just not an affordable feature anymore. So, in Microvium, closures are stripped down to their bare minimum. Functions in Microvium are not objects — they cannot have properties2. And in return for this trade-off, Microvium closures can be really tiny: the closure in the above example is just 8 bytes! (Including a 2-byte reference and 2-byte allocation header).

Another reason why this is so small in Microvium is its 16-bit slot size. It stores all pointers as 16-bit, even on a 32-bit machine. Numbers, such as x in the example, start out as 16-bit but grow as needed. This is great for things like counters which are likely to be small most of the time but which are able to count up to 253 without overflowing. This is in contrast to C where you typically need to use memory upfront for the largest-possible value that a variable might have.

In general, closures like this in Microvium take 4 + 2n bytes of memory on the heap, where n is the number of variables. This is down from 10 + 2n bytes of memory in the previous design which is a nice improvement.

There are two further things to mention that affect the size of closures here which may not be obvious. One is that the variable x in the example is inline into the closure itself. It’s generally the case in Microvium that the first nested function in a scope is actually unified with the scope itself. Further nested functions will be separate allocations.

The other thing here that brings down the size is the fact that there is no parent pointer to link the scope chain. You can see more details about parent pointers in Microvium closure variable indexing. The new optimization is that the parent pointer is only included if needed. There is static analysis to determine when the scope chain is traversed from a child to a parent, and so including the parent pointer only in those cases. Considering that the global and module-level scope is not implemented as a closure scope, there are not many closures that actually need to make use of this parent pointer, so this will be a fairly common efficiency gain.


  1. Assuming here a 32-bit device and FreeRTOS heap. The size I’ve quoted here includes an 8-byte allocation, with 8-byte allocation header, and a 4-byte pointer to the allocation 

  2. Assigning properties on a closure is a runtime error. 

Microvium closures can use less RAM than C

Microvium closures can use less RAM than C

TL;DR: Closures are a good way to represent callbacks for non-blocking code. Although C doesn’t support closures, you can achieve something similar using structs on the heap, but you can get even better memory efficiency by using JavaScript running on Microvium.


I’ve recently made some improvements to the memory usage of closures in Microvium, which I’ll talk about in more detail in an upcoming post. But before I get there, I want to first talk about how a similar problem can be solved in C, as a point of comparison.

In this post, we’ll look at a scenario where closures are useful, starting with the most basic solution to the problem in C and then progressively improving the design to address its weaknesses until we end up with something closure-like, which I will then contrast with the same solution in JavaScript. This post is aimed at people who are more familiar with C than with JavaScript, and are not necessarily familiar with closures.

Preserving state during long-lived operations

Let’s start by imagining that we have a C program that needs to perform some slow, I/O-bound operation, such as sending data to a server. For simplicity, let’s assume that we just need to send a single integer, and let’s assume that someone has given us a library that does exactly this:

// Send an int to the server, and block until we get confirmation of receipt
void sendDataToServerAndWaitForResponse(int payload);

And let’s say that we have a requirement to log once the payload was sent:

void sendToServerAndLog(int num) {
  sendDataToServerAndWaitForResponse(num);
  printf("This num was successfully received by the server: %i\n", num);
}

So here I’ve constructed an example where the value of num, which is available before we contact the server, is also used after we get a response from the server. I’ve simplified the example by ignoring failure cases. This is a special case of a common situation where some program state needs to be preserved across long-lived operations, such as a state machine or retry counter.

A problem with the above solution is that it blocks the thread. In terms of memory usage, we can think of it as requiring the memory of a whole call stack, which may occupy anything from hundreds of bytes to megabytes, depending on the system we’re running. And more importantly, we can’t use the thread for anything else while it’s blocked.

Using a callback

To avoid blocking the thread, we could instead consider using a callback function. Imagine that our hypothetical library supported this by taking a callback argument, rather than blocking until completion:

typedef void Callback();
void sendDataToServer(int payload, Callback* callback);

Now we can write our application code like the following, where we’ve split the behavior of sendToServerAndLog into two parts — one part before sending the data and one that’s the continuation after we’ve received a response:

int save_num;

void sendToServerAndLog(int num) {
  save_num = num;
  sendDataToServer(num, &continue_sendToServerAndLog);
}

void continue_sendToServerAndLog() {
  int num = save_num;
  printf("This num was successfully received by the server: %i\n", num);
}

In order to have the state of num available in the callback, we needed to persist it somewhere, so we need the save_num variable. This solution is much more memory efficient that the previous — if we’re running on a 32-bit platform, our app code occupies only the 4 bytes of save_num. The library code now also needs to preserve the callback pointer across the call, which uses an additional 4 bytes of memory. So this solution takes a total of 8 bytes of memory during the server operation, compared to using a whole thread and call stack of memory before. But there are still two issues with this design:

  1. The save_num variable persists forever, even when we’re not sending data to the server. If the hypothetical library uses the same pattern to store the callback, then together the full 8 bytes are consuming memory forever.
  2. We can’t have multiple calls to the server running in parallel here. If we call sendToServerAndLog a second time before the first response is received, the value of save_num is corrupted.

Adding context

A common pattern to get around the above problem is to have the library code accept an additional parameter that it passes back to our callback, which here we might name “context” because it represents the contextual information of the particular caller:

typedef void Callback(void* context);
// Whatever `context` is provided will be passed to the `Callback` untouched.
void sendDataToServer(int payload, Callback* callback, void* context);

Now we can use it like this:

void sendToServerAndLog(int num) {
  int* context = malloc(sizeof(int));
  *context = num;
  sendDataToServer(num, &continue_sendToServerAndLog, context);
}

void continue_sendToServerAndLog(void* context_) {
  int* context = (int*)context_;
  int num = *context;
  printf("This num was successfully received by the server: %i\n", num);
  free(context);
}

Now, if we call sendToServerAndLog multiple times, each call creates a distinct context that is completely independent of any other call.

Side note: Why make the context a void* instead of just an int, since we only need an int here? The reason is that we’re imagining here that sendDataToServer is part of a reusable library. Even if it’s a first-party library, it’s generally better practice to write it in such a way that it’s decoupled from the particular way you use the library. Making it a void* allows the user of the library to decide the size and shape of the preserved state, rather than coupling it to the library itself.

This is the same reason why we use a function pointer at all, rather than just hardcoding the library to invoke our particular callback directly: the library should not be coupled to the particular function that executes after it. And as the program evolves, we might land up calling it from multiple places that each require a different callback.

Applying the same pattern to sendToServerAndLog

We decided that a good interface for the library function sendDataToServer is one that takes a callback and a context, because the operation takes a long time, and a caller may want to continue doing something else after the operation completes. But similarly, sendToServerAndLog is also an operation that takes a long time, and its caller may also want to do something when the operation completes.

If we’re working with highly coupled code, then maybe we already know whether or not the caller of sendToServerAndLog needs to do anything else afterward, and exactly what it needs to do. But if we want sendToServerAndLog to be a reusable code that is decoupled from its caller, then we should probably have it accept its own callback and context from its caller. If we do this, then we need to persist the caller’s callback and context until the whole operation completes, so let’s upgrade our context to a struct that includes these fields:

// Context for sendToServerAndLog
typedef struct sendToServerAndLog_Context {
  int num;
  Callback* caller_callback;
  void* caller_context;
} sendToServerAndLog_Context;

void sendToServerAndLog(int num, Callback* caller_callback, void* caller_context) {
  sendToServerAndLog_Context* context = malloc(sizeof *context);
  context->num = num;
  context->caller_callback = caller_callback;
  context->context = caller_context;
  sendDataToServer(num, &continue_sendToServerAndLog, context);
}

void continue_sendToServerAndLog(void* context_) {
  sendToServerAndLog_Context* context = context_;
  int num = context->num;
  printf("This num was successfully received by the server: %i\n", num);
  context->caller_callback(context->caller_context);
  free(context);
}

Embedding the function pointer

Stylistically, it’s interesting to consider one further modification to this example. You may or may not agree with the following design change, but it leads nicely into the topic of closures in Microvium, which I’ll get to in a moment.

Rather than having the function save and copy around 2 pieces of state on behalf of the caller — the caller_callback and caller_context — we can combine these into one: we can just reference the caller_context and require that the first field in the caller_context is the callback function pointer, as in the following code. Also, rather than calling this a context, let’s now going to call it a closure, since it captures both a function pointer and some general state. The relationship between this and real closures will become more clear later.

typedef void ClosureFunc(Closure* closure);

// A general closure is expected to take this shape
typedef struct Closure {
  // Must be the first field
  ClosureFunc* invoke;

  /* ...other fields may follow...*/
} Closure;

// Here we have a specific closure shape for our function
typedef struct sendToServerAndLog_Closure {
  // Must be the first field. If you're using C++, you might instead 
  // inherit `sendToServerAndLog_Closure` from `Closure`
  ClosureFunc* invoke;

  // Other fields:
  int num;
  // Note now that we don’t need to store a separate caller_context
  // since both the context and the function pointer are combined
  // into the single closure struct.
  Closure* caller_callback;
} sendToServerAndLog_Closure;

void sendToServerAndLog(int num, Closure* caller_callback) {
  sendToServerAndLog_Closure* closure = malloc(sizeof *closure);
  closure->invoke = continue_sendToServerAndLog;
  closure->num = num;
  closure->caller_callback = caller_callback;
  sendDataToServer(num, closure);
}

void continue_sendToServerAndLog(Closure* closure_) {
  sendToServerAndLog_Closure* closure = (sendToServerAndLog_Closure*)closure_;

  int num = closure->num;
  printf("This num was successfully received by the server: %i\n", num);

  Closure* caller_callback = closure->caller_callback;
  caller_callback->invoke(caller_callback);

  free(closure);
}

This final design is quite clean:

  • The memory for the transient state (e.g. num) is only allocated while the long-running operation is active. Remember that short-lived memory is cheaper memory.
  • It doesn’t block the thread, so it’s easier to parallelize multiple operations if needed, without each operation consuming a whole thread of memory.
  • Each layer is decoupled: sendDataToServer doesn’t need to know who its caller is, and similarly sendToServerAndLog doesn’t need to know who its caller is.
  • The callback is neatly encapsulated into a single pointer value that can be passed around as a first-class value. If you’re familiar with C++, this is a similar benefit to using the std::Function<> type.

But there are some disadvantages to this design:

  • Although it represents the same behavior as the first design (the synchronous code), the code is now a whole lot more complicated.
  • Here we’ve only shown one closure signature. But what if we needed a return value to be passed to the closure? In general, each different return type is going to need its own type definitions for ClosureFunc and Closure, which will add up to a lot of boilerplate.
  • The memory efficiency is not great because it uses malloc and free.
    • On my Windows machine with Visual C++, I measure malloc to have an overhead cost of 40 bytes per allocation (compiling for x86).
    • In FreeRTOS, each allocation has an overhead of 8 bytes on a 32-bit platform. With this figure, the closure in the example takes 20 bytes of heap space.
    • The heap in C/C++ can get fragmented, which costs additional memory.

Using Microvium Instead

We can write this same example in JavaScript using nested functions, as follows:

function sendToServerAndLog(num, caller_callback) {
  sendDataToServer(num, continue_sendToServerAndLog);

  function continue_sendToServerAndLog() {
    console.log(`This num was successfully received by the server: ${num}`);
    caller_callback();
  }
}

The nested function continue_sendToServerAndLog has access to variables in the outer function (in this case the parameters num and caller_callback). Here I tried to keep the function names consistent with the C example, but in practice, it may be more convenient to do the same thing using arrow function syntax, as follows:

function sendToServerAndLog(num, caller_callback) {
  sendDataToServer(num, () => {
    console.log(`This num was successfully received by the server: ${num}`);
    caller_callback();
  });
}

Either way, the values num and caller_callback are automatically captured into a closure on the JavaScript heap, making them available to the nested function automatically.

If you’re using the Microvium JavaScript engine, this created closure has a very similar structure in memory to the final example we did in C — it’s a single structure with a function pointer and two other variables. You may see now why I called the struct in the earlier C example a “closure”. The C code is a more explicit way of representing the same runtime structure, with similar benefits from a decoupling and modularity perspective, although clearly the JavaScript is more syntactically simple.

This closure heap allocation in Microvium will have the following characteristics:

  • If the num is an integer in the range1 -8192 to 8191, the closure occupies 8 bytes of memory, including a 2-byte allocation header, compared to the 20 bytes consumed by the C example on a FreeRTOS heap.
  • There is no fragmentation overhead, since the Microvium heap is compacting.
  • Allocating of the closure generally happens in constant time. Since the Microvium heap is contiguous, creating new allocations is similar to just bumping a free pointer forward.

Conclusion

We’ve walked through an example that’s representative of a common situation in program development, especially when networking is involved: network requests take time, and we can either block the current thread or we need another way to remember what we were doing so we can get back to it when the request completes. When writing your code in a modular and decoupled way, it’s better not to assume anything about the caller of your long-running application, so it’s better not to block the thread or hard-code anything about which callback to run or what state to hold onto.

In this case, Microvium actually offers you a way to make your code more memory efficient than the equivalent C code, while also making it easier to follow, and preserving the nice decoupling characteristics. Depending on your situation, this might make Microvium a good choice for orchestrating this kind of high-level program flow, especially when long-running tasks are involved and when you need to keep track of state across those tasks.


  1. I’d say this is another advantage of using Microvium: numbers automatically grow in size as-needed. Integers in the range -8192 to 8191 use 2 bytes of memory 

FFI with no glue code!

FFI with no glue code!

TL;DR: Microvium’s snapshotting paradigm allows a library to generate the FFI glue code, so you don’t have to.


How bad can it be?

Foreign function interfaces (FFIs) are notoriously difficult in JavaScript. If you take a look at the Node-API documentation for Node.js, you’ll see how confusing it can be. Take a brief look at the following “simple” example of a C++ function that adds 2 JavaScript numbers together. But don’t get bogged down in the details here, because the whole point of this post is to say that you don’t need to write code like this!

// addon.cc
#include <node.h>

namespace demo {

using v8::Exception;
using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Number;
using v8::Object;
using v8::String;
using v8::Value;

// This is the implementation of the "add" method
// Input arguments are passed using the
// const FunctionCallbackInfo<Value>& args struct
void Add(const FunctionCallbackInfo<Value>& args) {
  Isolate* isolate = args.GetIsolate();

  // Check the number of arguments passed.
  if (args.Length() < 2) {
    // Throw an Error that is passed back to JavaScript
    isolate->ThrowException(Exception::TypeError(
        String::NewFromUtf8(isolate,
                            "Wrong number of arguments").ToLocalChecked()));
    return;
  }

  // Check the argument types
  if (!args[0]->IsNumber() || !args[1]->IsNumber()) {
    isolate->ThrowException(Exception::TypeError(
        String::NewFromUtf8(isolate,
                            "Wrong arguments").ToLocalChecked()));
    return;
  }

  // Perform the operation
  double value =
      args[0].As<Number>()->Value() + args[1].As<Number>()->Value();
  Local<Number> num = Number::New(isolate, value);

  // Set the return value (using the passed in
  // FunctionCallbackInfo<Value>&)
  args.GetReturnValue().Set(num);
}

void Init(Local<Object> exports) {
  NODE_SET_METHOD(exports, "add", Add);
}

NODE_MODULE(NODE_GYP_MODULE_NAME, Init)

}  // namespace demo

Yikes! That’s a long function to add two numbers! And that’s even before we start talking about the complexity of garbage collection, handles, and scopes.

The above code is mostly so-called “glue code“, because most of it exists solely to interface between JavaScript and C++, rather than adding any functionality of its own.

Microvium’s approach is fundamentally different

For those who are new to this blog, Microvium is a JavaScript engine I’ve created for running a subset of JavaScript optimized for microcontrollers. But the concepts I’m going to describe here are much broader than microcontrollers.

I’ll explain the Microvium approach by going through an example.

In Microvium, the first thing you probably want to do is have your C++ host (e.g. firmware) call a JavaScript function, so I’ll cover that example first:

// main.js

import { generate, exportToC } from './lib/ffi.js'

// Export function to be callable from C
exportToC('void', 'myFunctionToCallFromC', [], () => {
  // ... function code here ...
});

generate();

Here I’m using a library called ffi.js, which I’ll explain later. It exposes an exportToC function which has the following signature:

function exportToC(returnType: Typename, funcName: string, params: Array<[paramType: Typename, paramName: string]>);

Combined with the function generate, the function exportToC automatically generates the required glue code for the exported function.

How is this possible?

Well, remember that in Microvium, the top-level module code runs at compile time, not runtime, and then we deploy a snapshot of the VM state rather than deploying the source or a bundle. And by default, the compile-time code also has access to Node.js modules1 such as fs, so it can access the file system. The generate function in the example uses fs to code-generate C++ files with the glue code for each function that you set up using exportToC.

So, let’s compile this JavaScript example in a terminal. For simplicity, I’m using --output-bytes here so I can get the literal snapshot bytes to paste into the C++ code later.

$ microvium main.js --output-bytes
Output generated: main.mvm-bc
154 bytes
{0x06,0x1c,0x06,0x00,0x9a,0x00,0xf2,0x75,0x03,0x00,0x00,0x00,0x1c,0x00,0x1c,0x00,0x24,0x00,0x24,0x00,0x2a,0x00,0x2c,0x00,0x80,0x00,0x8a,0x00,0xff,0xff,0x49,0x00,0xfe,0xff,0x7d,0x00,0x89,0x00,0x85,0x00,0x01,0x00,0x31,0x00,0x00,0x00,0x05,0x40,0x70,0x75,0x73,0x68,0x00,0x00,0x0d,0x50,0x04,0x31,0x30,0x30,0x88,0x1d,0x00,0x6b,0x12,0x6f,0x67,0x01,0x60,0x00,0x2f,0x50,0x05,0x88,0x19,0x00,0x89,0x00,0x00,0x88,0x1d,0x00,0x6b,0xa0,0x88,0x19,0x00,0x06,0xa0,0x10,0x12,0xe0,0x70,0x04,0x67,0x67,0x01,0x60,0x89,0x00,0x00,0x10,0x12,0x6b,0x11,0x78,0x01,0xa0,0x67,0x10,0x10,0x07,0x6c,0x10,0xa2,0x67,0x67,0x76,0xe2,0x00,0x00,0x00,0x03,0x50,0x01,0x01,0x60,0x00,0x0c,0x00,0x19,0x00,0x02,0x00,0x19,0x00,0x01,0x00,0x08,0xc0,0x05,0x00,0x05,0x00,0x31,0x00,0x39,0x00,0x04,0xd0,0x05,0x00,0x03,0x00}

A side effect of running this command is that it runs the top-level code in main.js (and transitively all the dependencies), which in turn generates the files App_ffi.hpp and App_ffi.cpp for us, which contains the glue code for this example.

So now that we have the generated glue code and the snapshot bytes, we can use this in a minimal C++ project2:

#include "App_ffi.hpp"

const uint8_t snapshot[] = {0x06,0x1c,0x06,0x00,0x9a,0x00,0xf2,0x75,0x03,0x00,0x00,0x00,0x1c,0x00,0x1c,0x00,0x24,0x00,0x24,0x00,0x2a,0x00,0x2c,0x00,0x80,0x00,0x8a,0x00,0xff,0xff,0x49,0x00,0xfe,0xff,0x7d,0x00,0x89,0x00,0x85,0x00,0x01,0x00,0x31,0x00,0x00,0x00,0x05,0x40,0x70,0x75,0x73,0x68,0x00,0x00,0x0d,0x50,0x04,0x31,0x30,0x30,0x88,0x1d,0x00,0x6b,0x12,0x6f,0x67,0x01,0x60,0x00,0x2f,0x50,0x05,0x88,0x19,0x00,0x89,0x00,0x00,0x88,0x1d,0x00,0x6b,0xa0,0x88,0x19,0x00,0x06,0xa0,0x10,0x12,0xe0,0x70,0x04,0x67,0x67,0x01,0x60,0x89,0x00,0x00,0x10,0x12,0x6b,0x11,0x78,0x01,0xa0,0x67,0x10,0x10,0x07,0x6c,0x10,0xa2,0x67,0x67,0x76,0xe2,0x00,0x00,0x00,0x03,0x50,0x01,0x01,0x60,0x00,0x0c,0x00,0x19,0x00,0x02,0x00,0x19,0x00,0x01,0x00,0x08,0xc0,0x05,0x00,0x05,0x00,0x31,0x00,0x39,0x00,0x04,0xd0,0x05,0x00,0x03,0x00};

void main() {
  // Load the JavaScript app from the snapshot
  App* app = new App(snapshot, sizeof snapshot);
  
  // Run the myFunctionToCallFromC function
  app->myFunctionToCallFromC();
}

How easy is that! Two lines of C++ to spin up the runtime engine and call a JavaScript function!

Let’s extend this example to have it call from JavaScript back to C++. Let’s say that we want to add two numbers together (like the earlier node.js monstrosity), and print the result:

import { generate, exportToC, importFromC } from './lib/ffi.js'

// Access the C++ function named "add" with parameters (int x, int y)
const add = importFromC('int', 'add', [['int', 'x'], ['int', 'y']]);

// Access the C++ function name "print" with parameter (std::string msg)
const print = importFromC('void', 'print', [['string', 'msg']]);

exportToC('void', 'myFunctionToCallFromC', [], () => {
  const x = add(1, 2);
  print(`The sum is ${x}`);
});

generate();

If we peek inside the generated “App_ffi.hpp” for this one, we’ll see it now has these lines as well:

// ...
extern int32_t add(App* app, int32_t x, int32_t y); // Must be implemented elsewhere
extern void print(App* app, std::string msg); // Must be implemented elsewhere
// ...

So, it’s automatically generated the function signatures of the imported functions, and all the glue code required to give the JavaScript code the ability to call these functions.

Now, let’s provide the implementation of these add and print functions in C++:

#include <iostream>
#include "App_ffi.hpp"

using namespace std;
using namespace mvm;

const uint8_t snapshot[] = {0x06,0x1c,0x06,0x00,0xd2,0x00,0xb7,0x71,0x03,0x00,0x00,0x00,0x1c,0x00,0x20,0x00,0x28,0x00,0x28,0x00,0x2e,0x00,0x32,0x00,0xb4,0x00,0xc2,0x00,0xff,0xff,0xfe,0xff,0xff,0xff,0x65,0x00,0xfe,0xff,0x99,0x00,0xc1,0x00,0xbd,0x00,0x01,0x00,0x3d,0x00,0x35,0x00,0x05,0x40,0x70,0x75,0x73,0x68,0x00,0x00,0x0c,0x40,0x54,0x68,0x65,0x20,0x73,0x75,0x6d,0x20,0x69,0x73,0x20,0x00,0x00,0x00,0x02,0x60,0x00,0x00,0x02,0x60,0x01,0x00,0x0d,0x50,0x04,0x31,0x30,0x30,0x88,0x1d,0x00,0x6b,0x12,0x6f,0x67,0x01,0x60,0x00,0x2f,0x50,0x05,0x88,0x19,0x00,0x89,0x00,0x00,0x88,0x1d,0x00,0x6b,0xa0,0x88,0x19,0x00,0x06,0xa0,0x10,0x12,0xe0,0x70,0x04,0x67,0x67,0x01,0x60,0x89,0x00,0x00,0x10,0x12,0x6b,0x11,0x78,0x01,0xa0,0x67,0x10,0x10,0x07,0x6c,0x10,0xa2,0x67,0x67,0x76,0xe2,0x00,0x00,0x00,0x1c,0x50,0x05,0x88,0x19,0x00,0x89,0x01,0x00,0x01,0x07,0x08,0x78,0x03,0xa0,0x89,0x02,0x00,0x01,0x88,0x3d,0x00,0x13,0x6c,0x78,0x02,0x67,0x67,0x01,0x60,0x0c,0x00,0x4d,0x00,0x51,0x00,0x19,0x00,0x02,0x00,0x19,0x00,0x01,0x00,0x08,0xc0,0x05,0x00,0x05,0x00,0x35,0x00,0x55,0x00,0x04,0xd0,0x05,0x00,0x03,0x00};

void main() {
  App* app = new App(snapshot, sizeof snapshot);
  app->myFunctionToCallFromC();
}

int32_t add(App* app, int32_t x, int32_t y) {
  return x + y;
}

void print(App* app, string msg) {
  cout << msg << endl;
}

That’s all! The glue code generated in App_ffi handles the conversions between JavaScript values and C++ values, such as converting the JavaScript string to an std string for the print.

What about dynamic types?

What if we don’t have a specific type we want to pass between JavaScript and C++? The FFI library provides a solution for this as well: Any.

Let’s say we want to make the add function polymorphic, so it can add either strings or integers. To do this, we just switch out int with any:

const add = importFromC('any', 'add', [['any', 'x'], ['any', 'y']]);

Then on the C++ side, we can write the add function like this (either adding integers or concatenating strings):

Any add(App* app, Any x, Any y) {
  if (x.type() == VM_T_NUMBER) {
    return app.newInt32(x.toInt32() + x.toInt32());
  } else {
    return app.newString(x.toString() + x.toString());
  }
}

The Any type is actually a reference type: it’s a garbage-collection-safe reference to a value in the JavaScript VM. It can also be used to safely interact with objects and arrays in JavaScript.

It’s about the concept, not the library

As of this writing, the FFI library used here (ffi.js) is not included with Microvium. It’s an early-stage concept library, which you can find here. There’s still more thought and functionality that needs to go into it before I’m ready to call it the “standard way” of interacting with Microvium and releasing it alongside the Microvium engine.

But I think the cool part here is not the FFI library itself, but the fact that the snapshotting paradigm facilitates libraries like this. The behavior doesn’t need to be baked into the engine — if you don’t like the way my FFI library does things, you can write your own3! The possibilities are endless. Do you want your library to also generate the makefile? You can! Do you want it to generate main.cpp? You can! Do you want it to work with C instead of C++? You can! Or rather… given a large enough community of users, you hope that someone else has done it already and shared their solution on npm or somewhere.

The concept runs deeper than just a typical code generator. Of course, anyone can write a code generator for node.js that generates the glue code for you, but it’s not easy in node.js to create a library that allows you to write code like this:

const add = importFromC('int', 'add', [['int', 'x'], ['int', 'y']]);
const print = importFromC('void', 'print', [['string', 'msg']]);

exportToC('void', 'myFunctionToCallFromC', [], () => {
  const x = add(1, 2);
  print(`The sum is ${x}`);
});

Why? Because this example combines runtime and compile-time code in the same place. Functions like exportToC create a bridge between C++ and JavaScript, and encapsulate the details of that bridge. We don’t care how the library works, as long as it adheres to the interface contract — the contract on both sides of the bridge — the contract in both JavaScript and C++.

The entities created when we call importFromC or exportToC span both the JavaScript and C++ domain and encapsulate the wiring required to connect the two sides.

It is the snapshotting paradigm of Microvium that enables a library that performs this kind of encapsulation and abstraction of a communication link. And interfacing between JavaScript and C++ is only the beginning of what you can do with this! There are some other things on the horizon that take this to the next level.

P.S. If you’ve made it this far, you might be interested in following me on Twitter…


  1. The Microvium compiler runs on Node.js, and exposes the Node.js API via a proxy layer 

  2. The FFI library I’ve made here uses C++ rather than C so that it can use RAII for automatically dealing with garbage collector handles. 

  3. Or rather, raise a ticket on GitHub so we can improve the same library. 

Microvium has Classes!

Microvium has Classes!

TL;DR: Microvium now has support for classes. Here are some details for the curious.


I’ve been putting off adding support for classes in Microvium because you can do a lot without them, and other features have been more important in my opinion. I’m also personally more of a fan of functional-style programming rather than object-orientated programming, so I seldom write classes myself.

But I kept running into little things that are difficult to do without support for classes. With the recent addition of try-catch, for example, you could now throw, but you couldn’t throw new Error(...) because Error is a class. Another example is the ECMA-419 API for embedded systems, which is an API standard for things like GPIO, but which relies heavily on support for classes.

Microvium now has support for classes, which I think makes it the only JavaScript engine under 100kB of flash that has all the features required to implement ECMA-419 (and Microvium is currently under 10kB at the moment).

Microvium functions are not objects

Classes in JavaScript are more or less just syntactic sugar for old-style prototype-based code. For example, the following are more or less equivalent in normal JavaScript (but the prototype style doesn’t work in Microvium):

// ES6 class style
class Foo {
  bar() { console.log('Method bar called') } 
}

// Old prototype-style
function Foo() {}
Foo.prototype.bar = function() { console.log('Method bar called') }

Both styles are setting up the Foo.prototype object, but with the class syntax, it is implicit. Then if you evaluate new Foo(), the engine will create a new object whose __protype__ is Foo.prototype.

This is all possible because functions in JavaScript are also objects, and can hold properties like any other object.

But Microvium functions do not support properties, so the function Foo cannot have a prototype property (or any other static class properties like Foo.myStaticProperty).

I made the choice to omit this feature from the engine because this pattern is relatively uncommon outside of classes. It would be a massive waste of RAM to have every function also be an object. A closure function in Microvium currently uses only uses 6 bytes of RAM, and most functions will use no RAM at all if the static analysis determines them not to be closures (e.g. all top-level functions). Whereas a pair of normal objects like { prototype: {} } is already 16 bytes of RAM (6 bytes for each object plus 4 bytes for each property). It would be madness to require that every function declaration uses 16 bytes of RAM just in case someone wants to use it as a constructor.

So how do classes work in Microvium?

The way it works is quite simple: when you use the class syntax, Microvium creates a function that supports properties (i.e. “a class”), but when you use the normal function syntax, it will keep its old behavior.

A class in Microvium is a distinct type. Although typeof MyClass will return "function", as per the JS spec, if you use the C function mvm_typeOf to probe the type, it will tell you VM_T_CLASS not VM_T_FUNCTION.

Internally, a class is a tuple of a props object and constructor function, as shown in the following diagram. When you access the class like an object (e.g. MyClass.myStaticProperty or MyClass.prototype), it delegates the property-access operation to the props object.

Memory layout for a minimal empty class and an instantiation of the class

When you construct an instance of the class using new MyClass(), it creates a new object whose prototype (__proto__) is props.prototype, and then calls the constructor function with the new object as the this parameter.

The constructor function itself could just be a bytecode function in ROM, but it can, in general, be any function because it delegates the call to the normal calling machinery of Microvium. For example, if you have a class declaration nested inside a function then the constructor will be a closure, with access to the locals of the surrounding function. Or the constructor can be an imported host function. But the simplest case is a class declared at the top level of the code, in which case the constructor part of the tuple just points directly to a function in the bytecode ROM image.

The key-value pair in the above diagram (where the key is "prototype") is how Microvium implements object properties. Each property is a 4-byte pair containing a 2-byte property key and a 2-byte value. Here I’m only assuming one property (MyClass.prototype) but other properties would follow contiguously if there were more.

The next field is unimportant for this example. When you add properties to an object, they’re actually added using a linked list rather than resizing the object structure (for CPU efficiency) but then the garbage collector compacts these linked lists into a contiguous form (for memory efficiency).

Classes and objects are expensive

Classes (and objects) are one of the more expensive features of Microvium. The above minimal/empty class is already 22 bytes of RAM. While each instance of the class is only 6 bytes of RAM, every property on an instance is an additional 4 bytes. So an object with just 2 properties is already 14 bytes.

Property lookup is also quite expensive:

  • It is linear-time because it has to search through the property list.
  • Many different things support property lookup, such as arrays, classes, buffers, and of course objects. The lookup code needs to figure out what it is in order to know how to find its properties.
  • Properties use string keys. For hard-coded literal strings, like x.y or x['y'], these strings don’t incur RAM overhead but they do add to the bytecode size.
  • For computed string properties like x['a' + 'b'], there is additional overhead to perform string interning — string interning is the process of consolidating different strings with the same content so that they also have the same reference identity, which makes property lookups more efficient.
  • String interning can potentially trigger a garbage collection cycle because it’s growing the intern table. Apart from the overhead of the collection itself, just the possibility of a garbage collection means that the implementation of property lookup needs to deal with the fact that all the memory may be shuffled around during the property access (e.g. all the temporaries need to be properly GC-reachable), which itself adds overhead.
  • A property lookup instruction in bytecode involves multiple steps consuming at least 5 bytes of bytecode:
    1. Loading the object from which to get the property (a 1-byte instruction, at least)
    2. Loading the string that represents the property (typically a 3-byte instruction because it embeds a reference to the string)
    3. An instruction to actually trigger the property loading (a 1-byte instruction)

Closures, on the other hand, are asymptotically more efficient. A closure that closes over 2 variables is 14 bytes — the same size as an object with 2 properties. But:

  • Each additional variable closed over is another 2 bytes, rather than the 4 bytes of a property.
  • Closure variable access is typically O(1) because closures are random-access structures and the indexes are computed at compile time by static analysis (see my post on closure variable indexing).
  • Up to 15 different closure variables can be accessed by a single 1-byte instruction, compared to a minimum of 5 bytes for the instruction sequence for object property access.

Can objects be more efficient?

The experimental static analysis technique I designed for Microvium Boost a while back computes all the information that would be required to convert an object to a fixed-length array, in cases where it’s possible to infer the closed set of property keys that are ever used to access an object. My hope is that in the future I could implement this optimization (transforming objects and classes into fixed-length arrays) which could have some significant performance benefits:

  • An object with 2 properties would be 6 bytes instead of 14.
  • Each additional object property would take another 2 bytes.
  • Property lookup would be O(1).
  • Up to 15 different properties could be accessed using a single-byte instruction.

Conclusion

As it stands, classes are not a feature that I would want to use very often since they are so expensive. But having classes opens the door to different kinds of libraries and APIs, and in particular, is a stepping-stone towards implementing other standards-compliant features in Microvium.

Inside Microvium Closures

Inside Microvium Closures

TL;DR: Support for closures in Microvium sets it apart from other JS engines of a similar size. Closures simplify state machines and enable functional-style code. Closures in snapshots are a new way of sharing compile-time state with the runtime program. This post goes through some examples and design details.

What is a closure?

MDN has already done a great job of explaining what a closure is, so I’m just going to borrow their explanation:

closure is the combination of a function bundled together (enclosed) with references to its surrounding state (the lexical environment). In other words, a closure gives you access to an outer function’s scope from an inner function. In JavaScript, closures are created every time a function is created, at function creation time.

Here is a simple example:

function makeCounter() {
  let x = 0;
  function incCounter() {
    x++;
    return x;
  }
  return incCounter;
}

const myCounter1 = makeCounter();
const myCounter2 = makeCounter();

console.log(myCounter1()); // 1
console.log(myCounter1()); // 2
console.log(myCounter1()); // 3

// myCounter2 is an independent counter
console.log(myCounter2()); // 1
console.log(myCounter2()); // 2
console.log(myCounter2()); // 3

In the above example, the function named incCounter is a closure because it closes over variable x in its outer lexical scope. A closure is just a function, nested in another function, which accesses variables in the outer function.

Microvium also supports the arrow function syntax, so the following example has the same output but is more concise:

const makeCounter = x => () => ++x;
const myCounter = makeCounter(0);
console.log(myCounter()); // 1
console.log(myCounter()); // 2
console.log(myCounter()); // 3

For more detail, take a look at the above-mentioned MDN article. The rest of this post will assume that you know what a closure is.

Why are closures useful?

Closures in snapshots

Let’s say that we want a script that exports two functions: one that prints “hello” and the other that prints “world”. Without closures, we could implement this in Microvium as follows:

vmExport(0, printHello);
vmExport(1, printWorld);

function printHello() {
  console.log('hello');
}

function printWorld() {
  console.log('world');
}

(Recall that vmExport is a function that the script calls to export a function to the host).

In this example, printHello and printWorld are each functions that take no arguments and will print the corresponding string to the console1.

With the introduction of closures, we could factor out the commonality between printHello and printWorld and just have a printX that can print either one:

const printHello = makePrinter('hello');
const printWorld = makePrinter('world');
vmExport(0, printHello);
vmExport(1, printWorld);

function makePrinter(thingToPrint) {
  return printX;
  function printX() {
    console.log(thingToPrint);
  }
}

This refactors the code so that console.log only appears once but is shared by both printHello and printWorld. For the simple case of console.log this doesn’t add much benefit, but you can imagine cases where printX is a lot more complicated and so this refactoring may be beneficial.

Another thing to note in this example is that makePrinter is called at compile time since it’s in the top-level code. The resulting closures printHello and printWorld instantiated at compile time are carried to runtime via the snapshot, along with their state (the value of thingToPrint). The closures feature here plays nicely with snapshotting as a new way to share compile-time state to runtime.

Depending on the style of code you’re familiar with, we can also write the same example more concisely as:

const makePrinter = s => () => console.log(x);
vmExport(0, makePrinter('hello'));
vmExport(1, makePrinter('world'));

If you’re not comfortable with the idea of functions returning other functions, here’s another variant of the example that does the same thing:

function exportPrinter(id, textToPrint) {
  vmExport(id, () => console.log(textToPrint));  
}
exportPrinter(0, 'hello');
exportPrinter(1, 'world');

We could also get the list of things to export from a dynamic source, such as an array (or even something read from a file at compile time using fs.readFileSync):

const printers = [
  { id: 0, textToPrint: 'hello' },
  { id: 1, textToPrint: 'world' },
];
// OR:
// printers = JSON.parse(fs.readFileSync('printers.json', 'utf8'));

for (let i = 0; i < printers.length; i++) {
  const id = printers[i].id;
  const textToPrint = printers[i].textToPrint; 
  vmExport(id, () => console.log(textToPrint));  
}

Side note: the above example also demonstrates the usefulness of vmExport being a normal function, rather than some special syntax. Think about your favorite language or engine and how you would implement the above in that language. You can’t define an extern void foo() {} inside a for-loop in C, or public static void foo() {} inside a for-loop in C#. The only solution in these environments to the objective of exporting a programmatically-defined set of functions would be to use a code generator and the result would be much more complicated.

Closures for state machines

It’s much easier and better performance to implement a finite state machine using closures. Consider the following two-state state machine:

The above state machine has 2 states, which I’ve called stateA and stateB. When event 1 is received while in stateA, the machine will transition to stateB, but if any other event (e.g. event 2) is received while in stateA, the machine will not transition to stateB. See Wikipedia for a more detailed description of FSMs.

The ability to use closures allows us to implement this state machine using a function for each state. In the following example, stateA is a function that receives events by its event parameter. We “make” stateA only when we need it, by calling enterStateA(). The example includes an eventCount as part of state A to show how states can have their own variables that are persisted across multiple events.

function enterStateA() {
  console.log('Transitioned to State A!');
  let eventCount = 0; // Some internal state only used by stateA

  // A function that handles events while we're in stateA
  function stateA(event) {
    if (event === 1) {
      currentState = enterStateB();
    } else {
      eventCount++;
      console.log(`Received ${eventCount} events while in state A`);
    }
  }

  return stateA;
}

function enterStateB() {
  console.log('Transitioned to State B!');
  return event => {
    if (event === 2) {
      currentState = enterStateA();
    }
  }
}

// We'll start in stateA
let currentState = enterStateA();

// Every time an event is received, we send it to the current state
const processEvent = event => currentState(event);

// Allow the host firmware to events to the state machine
vmExport(0, processEvent);

// Some example events:
processEvent(5); // Received 1 events while in state A
processEvent(5); // Received 2 events while in state A
processEvent(5); // Received 3 events while in state A
processEvent(1); // Transitioned to State B!
processEvent(1); //
processEvent(2); // Transitioned to State A!
processEvent(2); // Received 1 events while in state A

In the above example, state A has the eventCount counter which is part of its closure. When the system transitions to state B, the counter can be garbage collected. This might not be very useful when only considering a single counter variable, but the pattern generalizes nicely to systems that have more expensive states that may hold buffers and other resources.

Once you understand closures and higher-order functions, this is a very natural way to represent state machines.

Closures under the hood

Let’s go back to the simple “counter” example for illustration:

function makeCounter() {
  let x = 0;
  function incCounter() {
    return ++x;
  }
  return incCounter;
}

const myCounter = makeCounter();
console.log(myCounter()); // 1
console.log(myCounter()); // 2
console.log(myCounter()); // 3

The runtime memory layout for this example is as follows:

The global variable myCounter is a 2-byte slot, as are all variables in Microvium. The slot contains a pointer to the closure, which is an immutable tuple containing a reference to the function code (incCounter, in this example) and the enclosing lexical environment which in Microvium is called the scope.

The closure and scope are allocated on the garbage-collected heap — if and when the closure is no longer reachable, it will be freed.

When the closure myCounter is called, the VM sees that the callee is a closure and sets a special scope register in the VM to the closure’s scope before running the target bytecode. The bytecode can then interact with the scope variables through special machine instructions that leverage the scope register.

For a look into how variables are accessed by closure code, see my post on Closure Variable Indexing.

More efficient than objects

Closures are much more efficient than objects in Microvium:

  • Every closure variable is one word (2 bytes) while an object property is 2 words (4 bytes) because a property is stored as a key-value pair. Variables aren’t stored with their name because the static analysis can determine an index for them.
  • Closure variables can be accessed with single-byte instructions (a 4-bit opcode and 4-bit literal variable index) whereas typical object properties take at least 5 bytes of bytecode instructions to access. This is in part because the “current” closure scope is implicit, while there is no such thing as the “current” object (the object needs to be specified explicitly), and also because object property access requires specifying the property key which is a reference to a string.
  • All the object key strings take up memory in the bytecode image.
  • Most closure variables are accessed in O(1) time — adding more variables does not slow down the access time, but adding more properties to an object slows it down by O(n).

Conclusion

Closures are useful and form the backbone of a lot of JavaScript programming. I’ve talked before about how closures are useful for callback-style asynchronous code, and in this post, I’ve also shown how they are useful for modeling state machines and make certain kinds of refactoring possible.

Other tiny JavaScript engines such as Elk and mJS don’t support closures, so this feature sets Microvium apart from the crowd2.

The closures feature has been the single most complicated and time-consuming feature to implement in Microvium because the static analysis requires multiple passes and completely changed how the compiler front-end works. It’s really not as simple as one would first imagine. Consider, as just one example, that the let i binding in for (let i = 0 ...) has multiple instances of i relative to the containing function, and that on each iteration the previous value is copied to the new instance.

But after a long journey, I can say that I’m happy with the result.


  1. This example assumes that you’ve added `console.log` to the global scope 

  2. Larger engines such as Espruino and XS support closures along with many other features, but come at a serious size premium.