Microvium closures can use less RAM than C

Microvium closures can use less RAM than C

TL;DR: Closures are a good way to represent callbacks for non-blocking code. Although C doesn’t support closures, you can achieve something similar using structs on the heap, but you can get even better memory efficiency by using JavaScript running on Microvium.


I’ve recently made some improvements to the memory usage of closures in Microvium, which I’ll talk about in more detail in an upcoming post. But before I get there, I want to first talk about how a similar problem can be solved in C, as a point of comparison.

In this post, we’ll look at a scenario where closures are useful, starting with the most basic solution to the problem in C and then progressively improving the design to address its weaknesses until we end up with something closure-like, which I will then contrast with the same solution in JavaScript. This post is aimed at people who are more familiar with C than with JavaScript, and are not necessarily familiar with closures.

Preserving state during long-lived operations

Let’s start by imagining that we have a C program that needs to perform some slow, I/O-bound operation, such as sending data to a server. For simplicity, let’s assume that we just need to send a single integer, and let’s assume that someone has given us a library that does exactly this:

// Send an int to the server, and block until we get confirmation of receipt
void sendDataToServerAndWaitForResponse(int payload);

And let’s say that we have a requirement to log once the payload was sent:

void sendToServerAndLog(int num) {
  sendDataToServerAndWaitForResponse(num);
  printf("This num was successfully received by the server: %i\n", num);
}

So here I’ve constructed an example where the value of num, which is available before we contact the server, is also used after we get a response from the server. I’ve simplified the example by ignoring failure cases. This is a special case of a common situation where some program state needs to be preserved across long-lived operations, such as a state machine or retry counter.

A problem with the above solution is that it blocks the thread. In terms of memory usage, we can think of it as requiring the memory of a whole call stack, which may occupy anything from hundreds of bytes to megabytes, depending on the system we’re running. And more importantly, we can’t use the thread for anything else while it’s blocked.

Using a callback

To avoid blocking the thread, we could instead consider using a callback function. Imagine that our hypothetical library supported this by taking a callback argument, rather than blocking until completion:

typedef void Callback();
void sendDataToServer(int payload, Callback* callback);

Now we can write our application code like the following, where we’ve split the behavior of sendToServerAndLog into two parts — one part before sending the data and one that’s the continuation after we’ve received a response:

int save_num;

void sendToServerAndLog(int num) {
  save_num = num;
  sendDataToServer(num, &continue_sendToServerAndLog);
}

void continue_sendToServerAndLog() {
  int num = save_num;
  printf("This num was successfully received by the server: %i\n", num);
}

In order to have the state of num available in the callback, we needed to persist it somewhere, so we need the save_num variable. This solution is much more memory efficient that the previous — if we’re running on a 32-bit platform, our app code occupies only the 4 bytes of save_num. The library code now also needs to preserve the callback pointer across the call, which uses an additional 4 bytes of memory. So this solution takes a total of 8 bytes of memory during the server operation, compared to using a whole thread and call stack of memory before. But there are still two issues with this design:

  1. The save_num variable persists forever, even when we’re not sending data to the server. If the hypothetical library uses the same pattern to store the callback, then together the full 8 bytes are consuming memory forever.
  2. We can’t have multiple calls to the server running in parallel here. If we call sendToServerAndLog a second time before the first response is received, the value of save_num is corrupted.

Adding context

A common pattern to get around the above problem is to have the library code accept an additional parameter that it passes back to our callback, which here we might name “context” because it represents the contextual information of the particular caller:

typedef void Callback(void* context);
// Whatever `context` is provided will be passed to the `Callback` untouched.
void sendDataToServer(int payload, Callback* callback, void* context);

Now we can use it like this:

void sendToServerAndLog(int num) {
  int* context = malloc(sizeof(int));
  *context = num;
  sendDataToServer(num, &continue_sendToServerAndLog, context);
}

void continue_sendToServerAndLog(void* context_) {
  int* context = (int*)context_;
  int num = *context;
  printf("This num was successfully received by the server: %i\n", num);
  free(context);
}

Now, if we call sendToServerAndLog multiple times, each call creates a distinct context that is completely independent of any other call.

Side note: Why make the context a void* instead of just an int, since we only need an int here? The reason is that we’re imagining here that sendDataToServer is part of a reusable library. Even if it’s a first-party library, it’s generally better practice to write it in such a way that it’s decoupled from the particular way you use the library. Making it a void* allows the user of the library to decide the size and shape of the preserved state, rather than coupling it to the library itself.

This is the same reason why we use a function pointer at all, rather than just hardcoding the library to invoke our particular callback directly: the library should not be coupled to the particular function that executes after it. And as the program evolves, we might land up calling it from multiple places that each require a different callback.

Applying the same pattern to sendToServerAndLog

We decided that a good interface for the library function sendDataToServer is one that takes a callback and a context, because the operation takes a long time, and a caller may want to continue doing something else after the operation completes. But similarly, sendToServerAndLog is also an operation that takes a long time, and its caller may also want to do something when the operation completes.

If we’re working with highly coupled code, then maybe we already know whether or not the caller of sendToServerAndLog needs to do anything else afterward, and exactly what it needs to do. But if we want sendToServerAndLog to be a reusable code that is decoupled from its caller, then we should probably have it accept its own callback and context from its caller. If we do this, then we need to persist the caller’s callback and context until the whole operation completes, so let’s upgrade our context to a struct that includes these fields:

// Context for sendToServerAndLog
typedef struct sendToServerAndLog_Context {
  int num;
  Callback* caller_callback;
  void* caller_context;
} sendToServerAndLog_Context;

void sendToServerAndLog(int num, Callback* caller_callback, void* caller_context) {
  sendToServerAndLog_Context* context = malloc(sizeof *context);
  context->num = num;
  context->caller_callback = caller_callback;
  context->context = caller_context;
  sendDataToServer(num, &continue_sendToServerAndLog, context);
}

void continue_sendToServerAndLog(void* context_) {
  sendToServerAndLog_Context* context = context_;
  int num = context->num;
  printf("This num was successfully received by the server: %i\n", num);
  context->caller_callback(context->caller_context);
  free(context);
}

Embedding the function pointer

Stylistically, it’s interesting to consider one further modification to this example. You may or may not agree with the following design change, but it leads nicely into the topic of closures in Microvium, which I’ll get to in a moment.

Rather than having the function save and copy around 2 pieces of state on behalf of the caller — the caller_callback and caller_context — we can combine these into one: we can just reference the caller_context and require that the first field in the caller_context is the callback function pointer, as in the following code. Also, rather than calling this a context, let’s now going to call it a closure, since it captures both a function pointer and some general state. The relationship between this and real closures will become more clear later.

typedef void ClosureFunc(Closure* closure);

// A general closure is expected to take this shape
typedef struct Closure {
  // Must be the first field
  ClosureFunc* invoke;

  /* ...other fields may follow...*/
} Closure;

// Here we have a specific closure shape for our function
typedef struct sendToServerAndLog_Closure {
  // Must be the first field. If you're using C++, you might instead 
  // inherit `sendToServerAndLog_Closure` from `Closure`
  ClosureFunc* invoke;

  // Other fields:
  int num;
  // Note now that we don’t need to store a separate caller_context
  // since both the context and the function pointer are combined
  // into the single closure struct.
  Closure* caller_callback;
} sendToServerAndLog_Closure;

void sendToServerAndLog(int num, Closure* caller_callback) {
  sendToServerAndLog_Closure* closure = malloc(sizeof *closure);
  closure->invoke = continue_sendToServerAndLog;
  closure->num = num;
  closure->caller_callback = caller_callback;
  sendDataToServer(num, closure);
}

void continue_sendToServerAndLog(Closure* closure_) {
  sendToServerAndLog_Closure* closure = (sendToServerAndLog_Closure*)closure_;

  int num = closure->num;
  printf("This num was successfully received by the server: %i\n", num);

  Closure* caller_callback = closure->caller_callback;
  caller_callback->invoke(caller_callback);

  free(closure);
}

This final design is quite clean:

  • The memory for the transient state (e.g. num) is only allocated while the long-running operation is active. Remember that short-lived memory is cheaper memory.
  • It doesn’t block the thread, so it’s easier to parallelize multiple operations if needed, without each operation consuming a whole thread of memory.
  • Each layer is decoupled: sendDataToServer doesn’t need to know who its caller is, and similarly sendToServerAndLog doesn’t need to know who its caller is.
  • The callback is neatly encapsulated into a single pointer value that can be passed around as a first-class value. If you’re familiar with C++, this is a similar benefit to using the std::Function<> type.

But there are some disadvantages to this design:

  • Although it represents the same behavior as the first design (the synchronous code), the code is now a whole lot more complicated.
  • Here we’ve only shown one closure signature. But what if we needed a return value to be passed to the closure? In general, each different return type is going to need its own type definitions for ClosureFunc and Closure, which will add up to a lot of boilerplate.
  • The memory efficiency is not great because it uses malloc and free.
    • On my Windows machine with Visual C++, I measure malloc to have an overhead cost of 40 bytes per allocation (compiling for x86).
    • In FreeRTOS, each allocation has an overhead of 8 bytes on a 32-bit platform. With this figure, the closure in the example takes 20 bytes of heap space.
    • The heap in C/C++ can get fragmented, which costs additional memory.

Using Microvium Instead

We can write this same example in JavaScript using nested functions, as follows:

function sendToServerAndLog(num, caller_callback) {
  sendDataToServer(num, continue_sendToServerAndLog);

  function continue_sendToServerAndLog() {
    console.log(`This num was successfully received by the server: ${num}`);
    caller_callback();
  }
}

The nested function continue_sendToServerAndLog has access to variables in the outer function (in this case the parameters num and caller_callback). Here I tried to keep the function names consistent with the C example, but in practice, it may be more convenient to do the same thing using arrow function syntax, as follows:

function sendToServerAndLog(num, caller_callback) {
  sendDataToServer(num, () => {
    console.log(`This num was successfully received by the server: ${num}`);
    caller_callback();
  });
}

Either way, the values num and caller_callback are automatically captured into a closure on the JavaScript heap, making them available to the nested function automatically.

If you’re using the Microvium JavaScript engine, this created closure has a very similar structure in memory to the final example we did in C — it’s a single structure with a function pointer and two other variables. You may see now why I called the struct in the earlier C example a “closure”. The C code is a more explicit way of representing the same runtime structure, with similar benefits from a decoupling and modularity perspective, although clearly the JavaScript is more syntactically simple.

This closure heap allocation in Microvium will have the following characteristics:

  • If the num is an integer in the range1 -8192 to 8191, the closure occupies 8 bytes of memory, including a 2-byte allocation header, compared to the 20 bytes consumed by the C example on a FreeRTOS heap.
  • There is no fragmentation overhead, since the Microvium heap is compacting.
  • Allocating of the closure generally happens in constant time. Since the Microvium heap is contiguous, creating new allocations is similar to just bumping a free pointer forward.

Conclusion

We’ve walked through an example that’s representative of a common situation in program development, especially when networking is involved: network requests take time, and we can either block the current thread or we need another way to remember what we were doing so we can get back to it when the request completes. When writing your code in a modular and decoupled way, it’s better not to assume anything about the caller of your long-running application, so it’s better not to block the thread or hard-code anything about which callback to run or what state to hold onto.

In this case, Microvium actually offers you a way to make your code more memory efficient than the equivalent C code, while also making it easier to follow, and preserving the nice decoupling characteristics. Depending on your situation, this might make Microvium a good choice for orchestrating this kind of high-level program flow, especially when long-running tasks are involved and when you need to keep track of state across those tasks.


  1. I’d say this is another advantage of using Microvium: numbers automatically grow in size as-needed. Integers in the range -8192 to 8191 use 2 bytes of memory 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.