The Microvium garbage collector is here!
The garbage collector handles the automatic freeing of unused heap memory in the VM.
Microvium makes some interesting and perhaps unique tradeoffs with garbage collection which I’ll focus on in this post.
I’ll divide this post into two parts to cater for different audiences: I’ll start with a summary of the features at a high level, for those who don’t care about the details and design decisions, and then I’ll go into more detail.
In no particular order, here are some features of the Microvium memory management system:
- Allocations are dynamically-sized, each having a 2-byte allocation header.
- Allocation headers are optional in some circumstances, so allocations can be even more compact1
- Allocations can be as small as 2 bytes2. For example, single-character strings are stored as 2-byte allocations, having a character3 and an implicit null terminator.
- The garbage collector is compacting (wiki: Mark-Compact). So, there is no heap fragmentation.
- Allocating on the heap is cheap and is constant-time — there is no searching through free lists.
- Data structures used for garbage collection are allocated only during garbage collection — no structures are persisted while the VM idle.
- Memory is acquired from the host in chunks as-needed, so users don’t need to decide ahead of time how much memory they want to commit to a particular VM.
I’ll add in two limitations for completeness:
- The memory for a VM is currently limited to 16 kB
- A VM temporarily uses about twice its memory during the garbage collection phase.
While Microvium can and does run fine on desktop-class machines, it’s optimized primarily for microcontrollers. Microcontrollers have quite different characteristics that influenced the design decisions of the garbage collector (GC):
RAM is small relative to processing power.
A 3 GHz desktop computer might have 16 GB of RAM, while a 3 MHz MCU might have 16 kB of RAM — a thousand times less processing power but a million times smaller RAM (obviously this is a very rough calculation to illustrate a point). This is relevant to the GC design because it should be heavily biased towards having a small RAM footprint, since RAM is relatively more expensive than CPU on a microcontroller.
There is no separate CPU cache
Most MCUs I’ve dealt with have no cache — all memory is accessed with constant time4, and often on a single instruction cycle. This affects the design of a GC because things like locality, order-of-access, and prefetching are not factors that need to be considered.
Microvium is optimized for 16-bit platforms. Its native value type is a constant-sized 16-bit dynamically-typed value. Anything that doesn’t fit in this small size needs to be heap-allocated, including strings, 32-bit integers5, floats, arrays, objects, etc.
Since Microvium expects these kinds of values to be frequently allocated on the heap instead of the stack, the heap is designed to be fast to allocate onto, allocations shouldn’t incur a lot of memory overhead, and allocations shouldn’t require a lot of padding.
For this reason, Microvium allocations are only 2-byte aligned, so that the most padding ever required will be 1 byte, and only for odd-sized allocations.
Another significant design consideration is that a Microvium program is not expected to typically be the main program running on a device. Rather, it is optimized for situations where there is an existing firmware application and a Microvium script is only consulted occasionally. The rest of the time, the script is “dormant” and should occupy as few resources as possible.
I think this point is important, so let me explain in a bit more detail…
The way I visualize this design consideration is to think that any one Microvium program is only one out of a number of “tasks” or “components” that exist in the firmware application as a whole. It doesn’t matter whether the other components of the application are other Microvium programs or are C modules — the design considerations are the same.
For argument’s sake, let’s say that some application has 12 components. In the following diagram, I’m representing the RAM space as the dark circle, and each of the colored circles is a “component” occupying RAM, and the space in the middle is vacant memory. Let’s say that each component in this diagram is dormant, but is designed to become active in a particular scenario.
You might think of these as 12 “modes” or “activities”. To use a concrete example, consider an application with BlueTooth support, where one of these 12 components is the part of the firmware that handles BlueTooth communication. Most of the time, there’s no BlueTooth activity, and so the component of the application which handles BlueTooth communication is dormant.
When there is some Bluetooth activity, the Bluetooth component can take the stage for a short time to do its work, and then afterward, pack itself up and go back to it’s idle/dormant state. While the component is active, it will likely consume more memory, as it juggles BlueTooth messages in memory, or whatever else it needs to perform its activities. We hope that in most cases, when it’s done processing the BlueTooth event, it can pack up and release memory before waiting dormant until the next event. While it’s packed up, the “stage” is clear (there is free memory) for other components wake up and do what they need to do to respond to other events as they come.
This is the kind of situation that Microvium has been optimized for.
Why do I bring this up?
In this model, the memory that an average component uses while dormant is 12 times more costly than the additional memory it uses while active, because the average dormant memory is multiplied by the number of dormant components, while in this model, there is only one active component at a time.
While this example is an oversimplification, I think it still illustrates the point I’m making: for applications with dynamic memory allocation, memory held by a component for a long time is typically more expensive than memory held for a short time. Microvium can be used in situations where this is not the case, but it’s optimized for situations where this is the case.
This is important when it comes to GC design: a VM that incurs a lot of memory overhead for a very short time is considered better than one that has much less memory overhead but maintains it forever.
What kind of memory overhead am I talking about? Here are some examples:
- Some GC algorithms store extra data in the allocation headers6. This is permanent overhead proportional to the dormant heap size of the VM, and so is expensive. Microvium does not do this.
- Some managed heaps can get fragmented. Even if fragmentation only caused the VM to consume an additional 20% dormant memory, by our above example, this is roughly equivalent to consuming 20%*12 = 240% additional peak memory. So fragmentation is expensive. Microvium uses a compacting collector to fully eliminate fragmentation; to “squeeze” out all the unused spaces before a VM goes dormant.
On the other hand, the Microvium garbage collector uses quite a lot of memory while collecting. In fact, during a typical collection, the amount of memory consumed by the VM may be double for a brief time.
This is because the Microvium collector is actually a semispace collector. At the time of collection, it allocates a whole new heap for the VM, and then copies reachable objects from the old heap to the new one, before discarding the old heap.
The above design considerations are critical to understanding this design choice: for situations where a Microvium script is a small component in a larger firmware system, and is dormant during most of the firmware’s activities, the overwhelming metric to optimize is how well the VM can be “packed up” while it’s dormant, and Microvium achieves this with great success.