TL;DR: the job queue in Microvium is used just for executing async continuations. It uses only 2 bytes of idle memory and doesn’t require allocating any threads or special support from the host of any kind.
- Part 1: Introduction to async/await for embedded systems
- Part 2: Callback-based async-await
- Part 3: Making Promises
- Part 4: The Job Queue (this post)
What is a job queue?
If you’re an embedded programmer, you may be familiar with the so-called “super-loop” architecture, where a single
main function has an infinite loop and will check various things to see what work needs to be done on each cycle of the loop. Often these checks are hard-coded, but an alternative design that’s quite scalable is to have a queue of work that needs to be done, and the main loop just pull work off the queue and performs it.
In Microvium, the job queue is used solely to execute async continuations (which includes all promise subscribers). If
foo is awaiting
bar, then when
bar completes, a job will be scheduled to continue
foo where it left off2.
This behavior is required for Microvium to be ECMAScript spec compliant, but it also has the advantage of using less peak stack memory, since a cascade of continuations will each be executed in a fresh job when the call stack is empty, rather than executing on top of each other like a cascade of traditional callbacks.
The Microvium job queue is invisible to the host
An important factor in the design of Microvium is to make it as easy as possible to integrate into host firmware. One thing I was worried about with the introduction of a job queue is that it would complicate the C integration API, because it could require some way to pump the queue (to execute enqueued jobs). Either this might appear like an explicit API call to Microvium, like
mvm_processJobs(), or it might require additional hooks in the porting layer to allow Microvium to hook into the operating system to set up a job thread, or hook into a global super-loop, etc.
None of these options are acceptable to me. I want the “getting started” process to be as seamless as possible for newcomers, and forcing newcomers to do this integration work is just too complicated in my opinion. It would require conveying new concepts like “what is a job queue?” and “when should or shouldn’t I pump the job queue?”, just to get started with “hello world”.
The solution I went with in the end, is that Microvium automatically pumps the job queue at the end of
This approach makes it similar to explicit callback-based asynchrony, where callbacks would be executed inline and therefore before control returns to the host. The difference is that the continuations are executed a little bit later before returning to the host. The amount of time that
mvm_call blocks the host for doesn’t change between using async-await and using explicit callbacks.
To put it another way, the program may be sitting idle, waiting for something to happen, such as a timer, I/O event, or user interaction. When something happens, the host might
mvm_call the guest to let it know about the event. The way that Microvium is designed, the guest will fully process the event and reach its next equilibrium state before
mvm_call returns to the host.
Job queue structure
How does the job queue look under the hood?
I’m very hesitant to add things to Microvium that use more memory permanently. So the job queue is implemented as a single VM register, which is just a 2-byte slot. It has multiple states:
- Empty: there are no jobs.
- Single job
- Multiple jobs
When there’s only a single job enqueued, the
jobs register simply points to the job. A job is any function, but in particular it’s almost always going to be a closure since a closure is a function type that can encapsulate some dynamic state (async continuations being a special case of closures). The job queue mechanism doesn’t care what the state is, but since this is a job queue for async continuations, the job closure is almost always going to capture the following state:
- The continuation to execute.
- Whether the awaited operation passed or failed (
- The result or error value (
This is all the state required for the job to invoke the continuation with
(isSuccess, result), as required by the CPS protocol I defined in the previous post.
The vast majority of the time, there will only be zero or one jobs in the job queue. This is because it’s relatively rare for multiple async functions to be awaiting the same promise. But Microvium does need to cater for the case of multiple jobs. When this happens, the
jobs register won’t point to a single job, but to a linked list of jobs.
The job pointed to by the register is the first job to execute, but there is an interesting design choice here where the
prev pointer of the
first node points to the
last node, making the linked list into a cycle.
This is simply to facilitate appending to the job queue in
O(1) time when adding new jobs, because we can access the last node directly through the first node.
Normally you can append to a doubly-linked list in
O(1) time by maintaining a separate pointer to the
last node, but I didn’t want to permanently dedicate a whole VM register for the relatively-rare and short-lived case where multiple jobs are waiting. Representing the jobs in a cycle is a neat way to get the best of both.
Note that jobs are removed from the cycle before they’re executed, so the cycle does not imply that jobs are executed multiple times. It’s merely an optimization that allows Microvium to access both the beginning and end of the queue via a single register in
A special case is where the cycle consists of only one job. That is, when
prev pointers of the first (and only) job point to the job itself. This case occurs when there were multiple jobs queued, but all except one have already been executed. The logic for this case surprisingly “just works”, meaning that you can easily insert a new job between the
last and the
first nodes even when the
last and the
first nodes are actually the same node.
A quick note about GC overhead. The diagram above with 12 jobs in the queue has 24 memory allocations. All 12 jobs are guaranteed to be executed before control returns to the host3, so they can be considered to be short-lived (allocated and freed within a single call to the VM). In a language like C, doing a
malloc for every job would be a lot of overhead, especially when jobs are as small as “run the next piece of this async function”. But in Microvium, this overhead isn’t too bad because of the characteristics of the Microvium garbage-collected heap:
- Allocation headers are small — only 2 bytes per allocation.
- Unlike with
malloc, there is no memory fragmentation because the heap is compacted.
- There is no overhead for collecting garbage memory4. Rather, Microvium occurs collection overhead on everything that isn’t garbage. Even hundreds of tiny temporary allocations can be swept away in one fell swoop.
- Allocation time is
O(1)because the Microvium heap is compacted. Creating new allocations is roughly analogous to just bumping a “free pointer” forward by N bytes5.
The job queue is one of the less complicated parts of the implementation of async/await, but still important. In the end, I’m quite happy with how the design turned out, requiring only 2 bytes of memory (the register itself) when the feature isn’t being used, being lean in the hot path where only one job is enqueued and executed at a time, but still allowing an unlimited number of jobs while keeping O(1) insertion and removal. And this was achieved without impacting the interface between the host and the VM at all, so users of Microvium don’t have any additional integration work.
In the previous posts, I may have used the shorthand of saying that the bar “calls” foo’s continuation. But now you know that it’s more accurate to say that bar *schedules* foo’s continuation to be called on the job queue. ↩
In the case of reentrant calls, the all jobs will be executed before control returns to the host from the outer-most mvm_call. ↩
Garbage memory does create memory pressure which indirectly has overhead by increasing the chance of a collection cycle. But during a collection cycle, the GC moves all living allocations to a new region and then frees the old region in one go, so the individual dead allocations do not contribute at all to the amount of time it takes to perform a GC collection. ↩
It’s not quite as efficient as just bumping a free pointer, since it needs to check if the heap actually has enough space, and potentially expand the heap if there isn’t. ↩