Microvium on WASM!
TL;DR: The Microvium runtime, @microvium/runtime
on npm, allows you to restore and run Microvium snapshots in a JavaScript host environment such as node.js or the browser, using a WASM build of the Microvium native engine.
Why?
It might seem strange to run a JavaScript engine on a JavaScript engine — why not just run the code directly on the host1 engine?
The answer is that now you can run the same identical Microvium snapshot on a microcontroller, on a server, or in the browser. If your snapshotted app contains business logic, you can now run the business logic in multiple places and have it behave identically.
Among other things, this can allow you to create a unified user experience for your IoT users:
- Consistent UI workflows whether the user is using the web interface or the device display.
- Shared validation and business logic.
- Previewing device behavior on the server before downloading it to the device.
You can also use it for testing and validation of your device logic in a simulated environment before putting the Microvium app onto a real device.
Sandboxing
Apps run on Microvium are completely sandboxed to the host — there is no interaction with the host at all except through the imports and exports that you give it. This allows you to safely run the app in any environment without worrying that it may misbehave.
The library has been designed so that the host can manipulate objects inside the VM, but it’s impossible for the VM to manipulate objects inside the host — anything passed from the host to the VM is passed by copy rather than by reference2.
Gas Counter
Microvium now has a gas counter. For example, the following line of code in the host tells the VM to stop after executing 2000 VM instructions:
vm.stopAfterNInstructions(2000);
Use this feature to catch misbehaving scripts that enter an infinite loop or hog the CPU.
Persistence
The library offers both a way to restore a snapshot to a running VM and to capture a snapshot of a running VM. Among other uses, you could use this for persistence of the VM state in a database, where a VM is restored from the database when there is a relevant event for it to process.
This is particularly easy to do because snapshots in Microvium can be quite small — the hello-world snapshot shown later in this post is only 124 bytes.
Lightweight and Fast
This runtime library is designed to be incredibly lightweight. The full Microvium compiler is hundreds of MB, but this runtime library has a minified size of only 64kB and uses no external npm dependencies.
The startup time for the library itself is about 10-30ms in node.js and it can restore or capture a snapshot of the VM in under 1ms.
The running RAM usage is 256kB per VM, so you can run thousands of them on a typical desktop-class machine without breaking a sweat.
What about the overhead of running on WASM compared to running natively?
Actually, the WASM library runs about 40x faster than the native execution of Microvium on a desktop-class machine with the default configuration. The reason is that Microvium is a 16-bit virtual machine, using 16-bit pointers internally, but a modern machine is 32- or 64-bit, so there needs to be a translation layer between the two. In the default configuration of the Microvium native engine that comes bundled with the compiler, this translation is done in software for maximum portability, which is slow. But in the WASM implementation, a lot of effort was spent to get Microvium to use the first 64kB of the WASM address space as if it was running on a 16-bit device natively, and let the host WASM engine do the translation (which is done in hardware).
Side note: you can get the native implementation to run much faster than the default by deciding a fixed location in memory to reserve for the 64kB of Microvium memory, which is the recommended way to run on 32-bit microcontrollers such as ARM. I haven’t tested the WASM speed against this alternative configuration.
Microvium itself is optimized to be small compared to other engines, and you can take advantage of that here as well. Although your VM can’t exceed 64kB of space, you can do a lot more in 64kB in Microvium than you can in other JavaScript engines. Consider, for example, how closures in Microvium can be 10-15x smaller in Microvium than in some other engines.
Easy to use
I put a lot of effort into making this library really easy to use. Let’s say that you have the following hello-world program you want to run in Microvium:
// my-app.js const print = vmImport(1); vmExport(1, main); function main() { print("Hello, World!") }
You can compile it with the Microvium compiler using the CLI command:
microvium my-app.js --output-bytes
The output-bytes
flag here tells the compiler to output the snapshot as a comma-separated list of bytes on the stdout, like 0x04,0x1c,0x03,0x00,0x48,….
.
Now you can use the new @microvium/runtime
library to run the snapshot in node.js or the browser, with just a few lines of code:
import Microvium from '@microvium/runtime'; const snapshot = [0x04,0x1c,0x03,0x00,0x48,…]; const imports = { [1]: console.log }; const vm = Microvium.restore(snapshot, imports); const { [1]: main } = vm.exports; main();
The imports in the above snippet give the VM access to host functionality, each identified by a numeric ID of your choosing. In this case it exposes the console.log
host function which the app imports as print
.
The exports
is the opposite and gives the host access to things which the app exported with vmExport
, also each associated with a numeric ID.
The library handles the translation of the VM data types to the host and vice versa, in a mostly-seamless way that makes the WASM library much easier to use than the raw microvium.h
interface for a C or C++ host.
The Journey
(Skip this section if don’t care about the detail of how this library was put together).
I thought this library would be a quick weekend project. After all, the Microvium C interface is relatively small and easy to use, and this is just a wrapper around a WASM build of the C library, right?
But it turned into weeks or maybe months of work in the end. Sure, in a weekend I could probably have compiled microvium.c
using Emscripten and done a few wrapper functions for the API. But that’s not up to my standard of usability and performance.
Emscripten is the standard, “easy” way to compile C/C++ to WASM. But it’s heavy and comes with lots of extra stuff that I didn’t need. For example, it comes with its own C runtime library (clib). Microvium compiles to just 10s of kB, but a pre-built runtime library can be MBs. Emscripten also comes with its bootstrapping code for running things at runtime. It’s not a lean solution by any means.
But the main reason I didn’t go with Emscripten is that it gives you no control over the address space of the final output. I was fairly convinced that I would get significant performance improvements if I could run Microvium as a 16-bit engine using exactly the first 64kB of WASM memory (addresses 0
to 0xffff
), but there was no way to do this using Emscripten.
Emscripten uses the Clang C/C++ compiler under the hood, so I went with that. Standard Clang allows you to provide a linker script that configures the memory address space for the final result, so that should have made it easy to get Microvium into the first 64kB of address space.
But, this is where the problems started!
While the normal Clang compiler (alternative to GCC) comes with this capability, it seems that the WASM-targetting Clang compiler doesn’t. So I had to hack a solution:
- By being very careful with the order of linking and declarations, I could allocate a 64kB variable that happens to be compiled to the beginning of the output address space (this could break in future versions of Clang, and then I’ll need to re-visit the solution).
- But! Clang doesn’t start the output addresses at address zero by default. But eventually I figured out the compiler options to do that.
- Ideally you should configure this giant 64kB variable as requiring no initialization, so it doesn’t take up space in the compiled WASM binary.
- But! The clang no-initialization section (
.bss
) is put after the initialized section (.data
) and because there’s no way to give it linker information, you have no control over this. So you have to declare the massive variable as requiring initialization. - But! This results in a massive compiled WASM file, with 64kB of zeros in it. So naturally, I need a post-processing step to remove this from the WASM binary so as not to bloat the library.
- But! To do this requires some WASM tools which don’t come with Clang. So we need to install them.
- But! They don’t install on Windows, so I need to install them in WSL and use that instead.
- But! I have other scripts which can only run in Windows. So I have an awful mix of Linux and Windows shell scripts that I need to run in the pipeline.
- Also, the tools for pulling apart the WASM binary and putting it back together don’t preserve debug symbols! So to do any debugging, I need to use the unprocessed WASM binary.
Clang doesn’t come with a clib
, and I looked into finding a lean clib
to use, but in the end I decided to write my own. Not the whole thing — only the subset of clib
components required for Microvium, which isn’t too much.
This was definitely the right choice. Some of the clib functions could be implemented using WASM intrinsics, and some as wrappers around host JavaScript operations. Writing the whole lot myself was a way to make sure that each one is implemented in the most sensible and efficient way for the WASM build.
Then came the actual interface between the VM and the host. Using a JavaScript host language opens up the door to a much more seamless way of interacting with the VM, with automatic translation between the VM data types and the host data types (which are both representations of JavaScript values). The library uses JavaScript Proxies to allow the host to have references to objects, arrays, functions, and classes inside the VM. It uses reference-counted handles, along with WeakMap
and FinalizationRegistry
, to manage references across the boundary, tracking them when memory in the VM moves, and freeing them when no-long needed.
There are a surprising number of details to consider. For example, the proxy for a Microvium array needs to support iterators so that the host can use for … of
loops on it. Similarly, you need proper enumeration of object keys so that JSON.stringify
works (and also object spread {...obj}
).
Uint8Array
isn’t implemented as a proxy because it would be horribly inefficient to access individual bytes through a proxy layer, but at the same time I didn’t want to pass Uint8Array
by-copy because of the inefficiency in the case of large blocks of data where you only need to change a small amount (e.g. implementing binary queuing and buffering using a shared, mutable Uint8Array
). So I implemented a custom wrapper class.
A lot of this work required changes to the Microvium engine itself, to allow the WASM wrapper to gain access to internals in such a way as to implement these features efficiently. The C API of Microvium doesn’t support property get/set on objects, it doesn’t support passing this
in function calls, it doesn’t support calling new
on classes, or enumerating object keys. These things are all more important when the host has first-class support to leveraging these features, as in the case of a JavaScript host.
The way I did this in the end was to mark some Microvium internal functions as MVM_HIDDEN
. By default, these are compiled using static
linkage, meaning they’re completely invisible to the host, but a host may choose to redefine this macro in the port file to make these select functions visible in the linker. But I still chose to keep them out of microvium.h
since the C interface is designed to be as simple as possible for C users.
Usability design is very important to me, and another hurdle I to jump over was how to give users a seamless experience when using WASM. The issue is that, at least today, bundlers do not support WASM. The library needs to load and compile the WASM, but where does it get the WASM binary from, if it’s not baked into the library JavaScript? Does a user need to host the .wasm
file at a URL and then give that URL to the library?
That would be a terrible user experience. In the end, I did bake the WASM library directly into the JavaScript source code: I have a pre-build script that converts the .wasm
file directly to a base64 string and plugs it straight into the library code. The library (a single, bundled, index.js
file) is then distributed to users with this WASM binary baked into it, and it decodes and compiles it at startup. Luckily I spent all the effort earlier to make the binary small, so there’s not much overhead in representing it as base64 or decoding it at startup (the wasm file is 36kB and whole library is 64kB including the base64-encoded wasm).
Conclusion
In the end, I’m happy with the way the library turned out. My favorite kind of complexity is the kind that’s hidden behind a simple and intuitive interface, which I think this library does successfully. The user doesn’t need to think about handle management, memory allocation, or type translation — everything more or less “just works” the way you would intuitively expect it to.
This library opens up so many opportunities for new ways of working Microvium, whether you just want to use it to execute some sandboxed logic, use it for testing Microvium apps, or build a distributed IoT application with a cloud-side component.
Note: the term host here refers to the program running the Microvium engine ↩
This was actually done because Microvium doesn’t have a proxy type, so it can’t maintain references to host objects. But the copy semantics also has security benefits, since it’s impossible for Microvium code to mutate host objects or use shared host objects as a secret communications channel between VMs. ↩