TL;DR: Microvium closures are now as small as 6 bytes on the heap, down from 12 bytes previously and compared to over 100 B in some other engines.
Microvium is all about squeezing things into tiny spaces, and the latest of these is closures — the memory structure used to carry the state of nested functions. If you’re new to closures, have a look at my previous post which explains closures from a C programmer’s perspective.
One of my goals with Microvium has been to make it feasible and realistic to run JavaScript on really tiny devices. More than half of the microcontrollers selling on Digikey have less than 16kB of RAM and 128kB of flash. When you’re working with devices this small, you count every byte you use.
People typically program these kinds of devices in C. However, as I’ve shown before, there can be some real, practical benefits to using a language like JavaScript in these situations. For example, JavaScript-style single-threading can be more memory-efficient than multithreading, and using a garbage collector like Microvium’s can avoid heap fragmentation overhead and help to reduce the memory leaks associated with manually freeing memory. And of course, your code may be less complicated when working in a higher-level language, which itself can mean less development time and fewer bugs.
When you’re working with such small memory spaces, what kind of size would you consider to be acceptable for a closure function, like the myCounter in the following example?
function makeCounter() {
let x = 0;
function incCounter() {
return ++x;
}
return incCounter;
}
const myCounter = makeCounter();
As I demonstrated in my previous post, if you code up something like this in C, it might take 20 bytes of memory1, ignoring fragmentation overhead. So what kind of size would be acceptable to get the convenience of the JavaScript syntax? Maybe 30 bytes? 50 bytes?
Other JavaScript engines that I’ve measured take over 100 bytes for a closure like this! (See here for my measurement methodology and feel free to correct it). That’s not a fault of the engine, it’s a product of the JavaScript spec which requires that all functions are objects. For function declarations (functions declared using function rather than arrow syntax), not only is the function an object, but it also has a fresh prototype object in case you use the function with new.
That’s a heavy price to pay! Maybe that’s ok on a desktop-class machine. But when you’re working on a device with tens of kilobytes of memory, that’s just not an affordable feature anymore. So, in Microvium, closures are stripped down to their bare minimum. Functions in Microvium are not objects — they cannot have properties. And in return for this trade-off, Microvium closures can be really tiny: the closure in the above example is just 8 bytes! (Including a 2-byte reference and 2-byte allocation header).
Another reason why this is so small in Microvium is its 16-bit slot size. It stores all pointers as 16-bit, even on a 32-bit machine. Numbers, such as x in the example, start out as 16-bit but grow as needed. This is great for things like counters which are likely to be small most of the time but which are able to count up to 253 without overflowing. This is in contrast to C where you typically need to use memory upfront for the largest-possible value that a variable might have.
In general, closures like this in Microvium take 4 + 2n bytes of memory on the heap, where n is the number of variables. This is down from 10 + 2n bytes of memory in the previous design which is a nice improvement.
Assuming here a 32-bit device and FreeRTOS heap. The size I’ve quoted here includes an 8-byte allocation, with 8-byte allocation header, and a 4-byte pointer to the allocation ↩
TL;DR: Closures are a good way to represent callbacks for non-blocking code. Although C doesn’t support closures, you can achieve something similar using structs on the heap, but you can get even better memory efficiency by using JavaScript running on Microvium.
I’ve recently made some improvements to the memory usage of closures in Microvium, which I’ll talk about in more detail in an upcoming post. But before I get there, I want to first talk about how a similar problem can be solved in C, as a point of comparison.
In this post, we’ll look at a scenario where closures are useful, starting with the most basic solution to the problem in C and then progressively improving the design to address its weaknesses until we end up with something closure-like, which I will then contrast with the same solution in JavaScript. This post is aimed at people who are more familiar with C than with JavaScript, and are not necessarily familiar with closures.
Preserving state during long-lived operations
Let’s start by imagining that we have a C program that needs to perform some slow, I/O-bound operation, such as sending data to a server. For simplicity, let’s assume that we just need to send a single integer, and let’s assume that someone has given us a library that does exactly this:
// Send an int to the server, and block until we get confirmation of receipt
void sendDataToServerAndWaitForResponse(int payload);
And let’s say that we have a requirement to log once the payload was sent:
void sendToServerAndLog(int num) {
sendDataToServerAndWaitForResponse(num);
printf("This num was successfully received by the server: %i\n", num);
}
So here I’ve constructed an example where the value of num, which is available before we contact the server, is also used after we get a response from the server. I’ve simplified the example by ignoring failure cases. This is a special case of a common situation where some program state needs to be preserved across long-lived operations, such as a state machine or retry counter.
A problem with the above solution is that it blocks the thread. In terms of memory usage, we can think of it as requiring the memory of a whole call stack, which may occupy anything from hundreds of bytes to megabytes, depending on the system we’re running. And more importantly, we can’t use the thread for anything else while it’s blocked.
Using a callback
To avoid blocking the thread, we could instead consider using a callback function. Imagine that our hypothetical library supported this by taking a callback argument, rather than blocking until completion:
Now we can write our application code like the following, where we’ve split the behavior of sendToServerAndLog into two parts — one part before sending the data and one that’s the continuation after we’ve received a response:
int save_num;
void sendToServerAndLog(int num) {
save_num = num;
sendDataToServer(num, &continue_sendToServerAndLog);
}
void continue_sendToServerAndLog() {
int num = save_num;
printf("This num was successfully received by the server: %i\n", num);
}
In order to have the state of num available in the callback, we needed to persist it somewhere, so we need the save_num variable. This solution is much more memory efficient that the previous — if we’re running on a 32-bit platform, our app code occupies only the 4 bytes of save_num. The library code now also needs to preserve the callback pointer across the call, which uses an additional 4 bytes of memory. So this solution takes a total of 8 bytes of memory during the server operation, compared to using a whole thread and call stack of memory before. But there are still two issues with this design:
The save_num variable persists forever, even when we’re not sending data to the server. If the hypothetical library uses the same pattern to store the callback, then together the full 8 bytes are consuming memory forever.
We can’t have multiple calls to the server running in parallel here. If we call sendToServerAndLog a second time before the first response is received, the value of save_num is corrupted.
Adding context
A common pattern to get around the above problem is to have the library code accept an additional parameter that it passes back to our callback, which here we might name “context” because it represents the contextual information of the particular caller:
typedef void Callback(void* context);
// Whatever `context` is provided will be passed to the `Callback` untouched.
void sendDataToServer(int payload, Callback* callback, void* context);
Now we can use it like this:
void sendToServerAndLog(int num) {
int* context = malloc(sizeof(int));
*context = num;
sendDataToServer(num, &continue_sendToServerAndLog, context);
}
void continue_sendToServerAndLog(void* context_) {
int* context = (int*)context_;
int num = *context;
printf("This num was successfully received by the server: %i\n", num);
free(context);
}
Now, if we call sendToServerAndLog multiple times, each call creates a distinct context that is completely independent of any other call.
Side note: Why make the context a void* instead of just an int, since we only need an int here? The reason is that we’re imagining here that sendDataToServer is part of a reusable library. Even if it’s a first-party library, it’s generally better practice to write it in such a way that it’s decoupled from the particular way you use the library. Making it a void* allows the user of the library to decide the size and shape of the preserved state, rather than coupling it to the library itself.
This is the same reason why we use a function pointer at all, rather than just hardcoding the library to invoke our particular callback directly: the library should not be coupled to the particular function that executes after it. And as the program evolves, we might land up calling it from multiple places that each require a different callback.
Applying the same pattern to sendToServerAndLog
We decided that a good interface for the library function sendDataToServer is one that takes a callback and a context, because the operation takes a long time, and a caller may want to continue doing something else after the operation completes. But similarly, sendToServerAndLog is also an operation that takes a long time, and its caller may also want to do something when the operation completes.
If we’re working with highly coupled code, then maybe we already know whether or not the caller of sendToServerAndLog needs to do anything else afterward, and exactly what it needs to do. But if we want sendToServerAndLog to be a reusable code that is decoupled from its caller, then we should probably have it accept its own callback and context from its caller. If we do this, then we need to persist the caller’s callback and context until the whole operation completes, so let’s upgrade our context to a struct that includes these fields:
// Context for sendToServerAndLog
typedef struct sendToServerAndLog_Context {
int num;
Callback* caller_callback;
void* caller_context;
} sendToServerAndLog_Context;
void sendToServerAndLog(int num, Callback* caller_callback, void* caller_context) {
sendToServerAndLog_Context* context = malloc(sizeof *context);
context->num = num;
context->caller_callback = caller_callback;
context->context = caller_context;
sendDataToServer(num, &continue_sendToServerAndLog, context);
}
void continue_sendToServerAndLog(void* context_) {
sendToServerAndLog_Context* context = context_;
int num = context->num;
printf("This num was successfully received by the server: %i\n", num);
context->caller_callback(context->caller_context);
free(context);
}
Embedding the function pointer
Stylistically, it’s interesting to consider one further modification to this example. You may or may not agree with the following design change, but it leads nicely into the topic of closures in Microvium, which I’ll get to in a moment.
Rather than having the function save and copy around 2 pieces of state on behalf of the caller — the caller_callback and caller_context — we can combine these into one: we can just reference the caller_context and require that the first field in the caller_context is the callback function pointer, as in the following code. Also, rather than calling this a context, let’s now going to call it a closure, since it captures both a function pointer and some general state. The relationship between this and real closures will become more clear later.
typedef void ClosureFunc(Closure* closure);
// A general closure is expected to take this shape
typedef struct Closure {
// Must be the first field
ClosureFunc* invoke;
/* ...other fields may follow...*/
} Closure;
// Here we have a specific closure shape for our function
typedef struct sendToServerAndLog_Closure {
// Must be the first field. If you're using C++, you might instead
// inherit `sendToServerAndLog_Closure` from `Closure`
ClosureFunc* invoke;
// Other fields:
int num;
// Note now that we don’t need to store a separate caller_context
// since both the context and the function pointer are combined
// into the single closure struct.
Closure* caller_callback;
} sendToServerAndLog_Closure;
void sendToServerAndLog(int num, Closure* caller_callback) {
sendToServerAndLog_Closure* closure = malloc(sizeof *closure);
closure->invoke = continue_sendToServerAndLog;
closure->num = num;
closure->caller_callback = caller_callback;
sendDataToServer(num, closure);
}
void continue_sendToServerAndLog(Closure* closure_) {
sendToServerAndLog_Closure* closure = (sendToServerAndLog_Closure*)closure_;
int num = closure->num;
printf("This num was successfully received by the server: %i\n", num);
Closure* caller_callback = closure->caller_callback;
caller_callback->invoke(caller_callback);
free(closure);
}
This final design is quite clean:
The memory for the transient state (e.g. num) is only allocated while the long-running operation is active. Remember that short-lived memory is cheaper memory.
It doesn’t block the thread, so it’s easier to parallelize multiple operations if needed, without each operation consuming a whole thread of memory.
Each layer is decoupled: sendDataToServer doesn’t need to know who its caller is, and similarly sendToServerAndLog doesn’t need to know who its caller is.
The callback is neatly encapsulated into a single pointer value that can be passed around as a first-class value. If you’re familiar with C++, this is a similar benefit to using the std::Function<> type.
But there are some disadvantages to this design:
Although it represents the same behavior as the first design (the synchronous code), the code is now a whole lot more complicated.
Here we’ve only shown one closure signature. But what if we needed a return value to be passed to the closure? In general, each different return type is going to need its own type definitions for ClosureFunc and Closure, which will add up to a lot of boilerplate.
The memory efficiency is not great because it uses malloc and free.
On my Windows machine with Visual C++, I measure malloc to have an overhead cost of 40 bytes per allocation (compiling for x86).
In FreeRTOS, each allocation has an overhead of 8 bytes on a 32-bit platform. With this figure, the closure in the example takes 20 bytes of heap space.
The heap in C/C++ can get fragmented, which costs additional memory.
Using Microvium Instead
We can write this same example in JavaScript using nested functions, as follows:
function sendToServerAndLog(num, caller_callback) {
sendDataToServer(num, continue_sendToServerAndLog);
function continue_sendToServerAndLog() {
console.log(`This num was successfully received by the server: ${num}`);
caller_callback();
}
}
The nested function continue_sendToServerAndLog has access to variables in the outer function (in this case the parameters num and caller_callback). Here I tried to keep the function names consistent with the C example, but in practice, it may be more convenient to do the same thing using arrow function syntax, as follows:
function sendToServerAndLog(num, caller_callback) {
sendDataToServer(num, () => {
console.log(`This num was successfully received by the server: ${num}`);
caller_callback();
});
}
Either way, the values num and caller_callback are automatically captured into a closure on the JavaScript heap, making them available to the nested function automatically.
If you’re using the Microvium JavaScript engine, this created closure has a very similar structure in memory to the final example we did in C — it’s a single structure with a function pointer and two other variables. You may see now why I called the struct in the earlier C example a “closure”. The C code is a more explicit way of representing the same runtime structure, with similar benefits from a decoupling and modularity perspective, although clearly the JavaScript is more syntactically simple.
This closure heap allocation in Microvium will have the following characteristics:
If the num is an integer in the range1-8192 to 8191, the closure occupies 8 bytes of memory, including a 2-byte allocation header, compared to the 20 bytes consumed by the C example on a FreeRTOS heap.
There is no fragmentation overhead, since the Microvium heap is compacting.
Allocating of the closure generally happens in constant time. Since the Microvium heap is contiguous, creating new allocations is similar to just bumping a free pointer forward.
Conclusion
We’ve walked through an example that’s representative of a common situation in program development, especially when networking is involved: network requests take time, and we can either block the current thread or we need another way to remember what we were doing so we can get back to it when the request completes. When writing your code in a modular and decoupled way, it’s better not to assume anything about the caller of your long-running application, so it’s better not to block the thread or hard-code anything about which callback to run or what state to hold onto.
In this case, Microvium actually offers you a way to make your code more memory efficient than the equivalent C code, while also making it easier to follow, and preserving the nice decoupling characteristics. Depending on your situation, this might make Microvium a good choice for orchestrating this kind of high-level program flow, especially when long-running tasks are involved and when you need to keep track of state across those tasks.
I’d say this is another advantage of using Microvium: numbers automatically grow in size as-needed. Integers in the range -8192 to 8191 use 2 bytes of memory ↩
TL;DR: Microvium’s snapshotting paradigm allows a library to generate the FFI glue code, so you don’t have to.
How bad can it be?
Foreign function interfaces (FFIs) are notoriously difficult in JavaScript. If you take a look at the Node-API documentation for Node.js, you’ll see how confusing it can be. Take a brief look at the following “simple” example of a C++ function that adds 2 JavaScript numbers together. But don’t get bogged down in the details here, because the whole point of this post is to say that you don’t need to write code like this!
// addon.cc
#include <node.h>
namespace demo {
using v8::Exception;
using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Number;
using v8::Object;
using v8::String;
using v8::Value;
// This is the implementation of the "add" method
// Input arguments are passed using the
// const FunctionCallbackInfo<Value>& args struct
void Add(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
// Check the number of arguments passed.
if (args.Length() < 2) {
// Throw an Error that is passed back to JavaScript
isolate->ThrowException(Exception::TypeError(
String::NewFromUtf8(isolate,
"Wrong number of arguments").ToLocalChecked()));
return;
}
// Check the argument types
if (!args[0]->IsNumber() || !args[1]->IsNumber()) {
isolate->ThrowException(Exception::TypeError(
String::NewFromUtf8(isolate,
"Wrong arguments").ToLocalChecked()));
return;
}
// Perform the operation
double value =
args[0].As<Number>()->Value() + args[1].As<Number>()->Value();
Local<Number> num = Number::New(isolate, value);
// Set the return value (using the passed in
// FunctionCallbackInfo<Value>&)
args.GetReturnValue().Set(num);
}
void Init(Local<Object> exports) {
NODE_SET_METHOD(exports, "add", Add);
}
NODE_MODULE(NODE_GYP_MODULE_NAME, Init)
} // namespace demo
The above code is mostly so-called “glue code“, because most of it exists solely to interface between JavaScript and C++, rather than adding any functionality of its own.
Microvium’s approach is fundamentally different
For those who are new to this blog, Microvium is a JavaScript engine I’ve created for running a subset of JavaScript optimized for microcontrollers. But the concepts I’m going to describe here are much broader than microcontrollers.
I’ll explain the Microvium approach by going through an example.
In Microvium, the first thing you probably want to do is have your C++ host (e.g. firmware) call a JavaScript function, so I’ll cover that example first:
// main.js
import { generate, exportToC } from './lib/ffi.js'
// Export function to be callable from C
exportToC('void', 'myFunctionToCallFromC', [], () => {
// ... function code here ...
});
generate();
Here I’m using a library called ffi.js, which I’ll explain later. It exposes an exportToC function which has the following signature:
function exportToC(returnType: Typename, funcName: string, params: Array<[paramType: Typename, paramName: string]>);
Combined with the function generate, the function exportToC automatically generates the required glue code for the exported function.
How is this possible?
Well, remember that in Microvium, the top-level module code runs at compile time, not runtime, and then we deploy a snapshot of the VM state rather than deploying the source or a bundle. And by default, the compile-time code also has access to Node.js modules1 such as fs, so it can access the file system. The generate function in the example uses fs to code-generate C++ files with the glue code for each function that you set up using exportToC.
So, let’s compile this JavaScript example in a terminal. For simplicity, I’m using --output-bytes here so I can get the literal snapshot bytes to paste into the C++ code later.
A side effect of running this command is that it runs the top-level code in main.js (and transitively all the dependencies), which in turn generates the files App_ffi.hpp and App_ffi.cpp for us, which contains the glue code for this example.
So now that we have the generated glue code and the snapshot bytes, we can use this in a minimal C++ project2:
#include "App_ffi.hpp"
const uint8_t snapshot[] = {0x06,0x1c,0x06,0x00,0x9a,0x00,0xf2,0x75,0x03,0x00,0x00,0x00,0x1c,0x00,0x1c,0x00,0x24,0x00,0x24,0x00,0x2a,0x00,0x2c,0x00,0x80,0x00,0x8a,0x00,0xff,0xff,0x49,0x00,0xfe,0xff,0x7d,0x00,0x89,0x00,0x85,0x00,0x01,0x00,0x31,0x00,0x00,0x00,0x05,0x40,0x70,0x75,0x73,0x68,0x00,0x00,0x0d,0x50,0x04,0x31,0x30,0x30,0x88,0x1d,0x00,0x6b,0x12,0x6f,0x67,0x01,0x60,0x00,0x2f,0x50,0x05,0x88,0x19,0x00,0x89,0x00,0x00,0x88,0x1d,0x00,0x6b,0xa0,0x88,0x19,0x00,0x06,0xa0,0x10,0x12,0xe0,0x70,0x04,0x67,0x67,0x01,0x60,0x89,0x00,0x00,0x10,0x12,0x6b,0x11,0x78,0x01,0xa0,0x67,0x10,0x10,0x07,0x6c,0x10,0xa2,0x67,0x67,0x76,0xe2,0x00,0x00,0x00,0x03,0x50,0x01,0x01,0x60,0x00,0x0c,0x00,0x19,0x00,0x02,0x00,0x19,0x00,0x01,0x00,0x08,0xc0,0x05,0x00,0x05,0x00,0x31,0x00,0x39,0x00,0x04,0xd0,0x05,0x00,0x03,0x00};
void main() {
// Load the JavaScript app from the snapshot
App* app = new App(snapshot, sizeof snapshot);
// Run the myFunctionToCallFromC function
app->myFunctionToCallFromC();
}
How easy is that! Two lines of C++ to spin up the runtime engine and call a JavaScript function!
Let’s extend this example to have it call from JavaScript back to C++. Let’s say that we want to add two numbers together (like the earlier node.js monstrosity), and print the result:
import { generate, exportToC, importFromC } from './lib/ffi.js'
// Access the C++ function named "add" with parameters (int x, int y)
const add = importFromC('int', 'add', [['int', 'x'], ['int', 'y']]);
// Access the C++ function name "print" with parameter (std::string msg)
const print = importFromC('void', 'print', [['string', 'msg']]);
exportToC('void', 'myFunctionToCallFromC', [], () => {
const x = add(1, 2);
print(`The sum is ${x}`);
});
generate();
If we peek inside the generated “App_ffi.hpp” for this one, we’ll see it now has these lines as well:
// ...
extern int32_t add(App* app, int32_t x, int32_t y); // Must be implemented elsewhere
extern void print(App* app, std::string msg); // Must be implemented elsewhere
// ...
So, it’s automatically generated the function signatures of the imported functions, and all the glue code required to give the JavaScript code the ability to call these functions.
Now, let’s provide the implementation of these add and print functions in C++:
That’s all! The glue code generated in App_ffi handles the conversions between JavaScript values and C++ values, such as converting the JavaScript string to an stdstring for the print.
What about dynamic types?
What if we don’t have a specific type we want to pass between JavaScript and C++? The FFI library provides a solution for this as well: Any.
Let’s say we want to make the add function polymorphic, so it can add either strings or integers. To do this, we just switch out int with any:
Then on the C++ side, we can write the add function like this (either adding integers or concatenating strings):
Any add(App* app, Any x, Any y) {
if (x.type() == VM_T_NUMBER) {
return app.newInt32(x.toInt32() + x.toInt32());
} else {
return app.newString(x.toString() + x.toString());
}
}
The Any type is actually a reference type: it’s a garbage-collection-safe reference to a value in the JavaScript VM. It can also be used to safely interact with objects and arrays in JavaScript.
It’s about the concept, not the library
As of this writing, the FFI library used here (ffi.js) is not included with Microvium. It’s an early-stage concept library, which you can find here. There’s still more thought and functionality that needs to go into it before I’m ready to call it the “standard way” of interacting with Microvium and releasing it alongside the Microvium engine.
But I think the cool part here is not the FFI library itself, but the fact that the snapshotting paradigm facilitates libraries like this. The behavior doesn’t need to be baked into the engine — if you don’t like the way my FFI library does things, you can write your own3! The possibilities are endless. Do you want your library to also generate the makefile? You can! Do you want it to generate main.cpp? You can! Do you want it to work with C instead of C++? You can! Or rather… given a large enough community of users, you hope that someone else has done it already and shared their solution on npm or somewhere.
The concept runs deeper than just a typical code generator. Of course, anyone can write a code generator for node.js that generates the glue code for you, but it’s not easy in node.js to create a library that allows you to write code like this:
Why? Because this example combines runtime and compile-time code in the same place. Functions like exportToC create a bridge between C++ and JavaScript, and encapsulate the details of that bridge. We don’t care how the library works, as long as it adheres to the interface contract — the contract on both sides of the bridge — the contract in both JavaScript and C++.
The entities created when we call importFromC or exportToC span both the JavaScript and C++ domain and encapsulate the wiring required to connect the two sides.
It is the snapshotting paradigm of Microvium that enables a library that performs this kind of encapsulation and abstraction of a communication link. And interfacing between JavaScript and C++ is only the beginning of what you can do with this! There are some other things on the horizon that take this to the next level.
P.S. If you’ve made it this far, you might be interested in following me on Twitter…