Author: Michael Hunter

Incidental vs Accidental Complexity

Incidental vs Accidental Complexity

Developers often talk about accidental vs essential software complexity. But is there a difference between accidental complexity and incidental complexity? I think there is.

I like this definition of essential complexity:

Essential complexity is how hard something is to do, regardless of how experienced you are, what tools you use or what new and flashy architecture pattern you used to solve the problem.

(source)

That same source describes accidental complexity as any complexity that isn’t essential complexity, and this may indeed be the way that the terms are typically used. But I’d like to propose a distinction between accidental and incidental complexity:

Accidental complexity: non-essential complexity you introduce into the design by mistake (by accident), and you can realize your mistake in an “oops” moment (and consider correcting or learning from it).

Incidental complexity: any non-essential complexity you introduce into the design, which may be by mistake or on purpose (it may or may not be accidental and may or may not be accompanied by an “oops” moment).

Accidental complexity is when I accidentally knocked a drinking glass off the table and it shattered into pieces, making my morning more complicated as I cleaned it up. It’s marked with “oops” and the realization of causing something I didn’t mean to.

Incidental complexity is when I choose to turn off the hotplate on my stove by pressing the digital “down” button 8 times until the level reaches zero, when my wife later informed me that pressing “up” once would achieve the same thing (since the level wraps around under certain circumstances). I did not accidentally press the “down” button 8 times, I did it intentionally, but was still ignorant of my inefficiency.

Clearly, there’s a fine line between the two. Defining something as incidental complexity but not accidental is a function of my ignorance and depends on my level of awareness, which changes over time and various from person to person.

Why introduce unnecessary complexity on purpose?

1. Simplicity takes effort

Paradoxically, often the simpler solution is also one that takes longer to design and implement, so it can be a strategic design choice to go with a more complicated solution than a simpler one.

This is especially true for POCs and MVPs in unfamiliar domains or where the business requirements aren’t completely clear. In these cases, you can’t be confident of all the details upfront and you’re quite likely to be rewriting the solution a few times before it reaches its final state. Spending time simplifying and organizing each iteration can be a wasted effort if it will just be thrown away.

There may also just not be time for the simpler solution. The added effort defining a clean abstraction and structure takes time and there is an opportunity cost to taking longer on something, maybe through losing business revenue when the feature is delivered later, or through the cost of not working on the next feature. There are tradeoffs to be made.

2. Ignorance: You don’t know any better

This might be a little controversial, but I would argue that if you design something as simple as you can possibly conceive, then any remaining incidental complexity is not “accidental”. Like the example earlier of me pressing “down” 8 times instead of “up” once.

As a concrete example in software, JavaScript never used to have a async/await or the Promise type, and as a result, code was often structured with nested callbacks to handle sequences of asynchronous actions (known as “callback hell”). This kind of code is clearly unnecessarily complicated, no matter how hard people tried to contain the complexity. But designs based on callbacks are not accidentally complicated — they have exactly the design intended by the author which is believed to be the best possible design.

But yet, promises could be (and were eventually) implemented as a third-party library, and so could the async/await pattern (through the abuse of generator functions). So the potential was there from the beginning for every developer to structure their code in a simpler way if they could just realize the potential to do so. The complexity was not imposed as a restriction in the programming language, it was imposed by ourselves because we hadn’t been awakened to a better way of seeing things.

It’s worth noting one more thing: if you’ve ever tried to explain JavaScript Promises to someone, you’ll know that an explanation of what a Promise is or does is unconvincing. Understanding how to think in terms of Promises requires a mental paradigm shift, which doesn’t occur merely by explanation. Even a concrete demonstration may be unconvincing since the code may be “shorter”, but is it really “simpler” when it’s full of all these new and unfamiliar concepts? The way to internalize a new paradigm like this is to use it and practice thinking in terms of it. Only then can we really see what we were missing all along.

Why does it matter?

I think the distinction between accidental and incidental complexity matters because it should keep us humble and aware that the way we write software today is probably ridiculously inefficient compared to the techniques, patterns, and paradigms of the future.

It shows that certain kinds of incidental complexity can look a lot like essential complexity if we don’t yet have the mental tools to see those inefficiencies. The biggest productivity boosts we could gain in our designs might be those that aren’t obvious from the perspective of our current ways of thinking, and which might take a lot of time and effort to see clearly.

It shows that finding incidental complexity is not just a case of looking at a project and identifying what we think is needless, since there’s a good chance that we’re blind to a lot of it and all we will just see are things that make us say “yes, that thing is important for reason xyz — we can’t remove that”. Like all the callbacks in callback hell: every callback seems important, so it seems like it is all essential complexity, even though there exists a completely different way of structuring the code that would make it easier to understand and maintain.

But on the positive side, it shows that we are capable of identifying a lot of incidental complexity, if we spend enough time and effort thinking through things carefully, and if we remain open and curious about better ways of doing things.

Immutable.js vs Immer

Immutable.js vs Immer

This week I came across the library Immer as a convenient way of manipulating immutable data in JavaScript. After reading this Reddit thread that raves about how much better Immer is than Immutable.Js, I was worried I’d made the wrong decision to use Immutable.js in Microvium. But some performance tests quickly cleared up my concerns.

Immer certainly seems convenient to use. It uses a combination of proxies and copy-on-write to allow you to use the normal mutating syntax in JavaScript and it handles the underlying immutability automatically.

But “copy-on-write” raised a red flag in my mind. When dealing with small changes to large collections, as Microvium does, I can’t possibly imagine how copying the whole collection on every write can be efficient. And indeed, a quick performance test shows how bad the situation is:

import { produce } from "immer";
import immutable from 'immutable';

const count = 100;

// Test Immutable.JS
let map1 = immutable.Map();
for (let i = 0; i < count; i++)
  map1 = map1.set(i, i);

// Test Immer
let map2 = new Map();
for (let i = 0; i < count; i++)
  map2 = produce(map2, m => m.set(i, i));

For different values of count, here are my results:

100100010k100k1M10M
Immer11 ms94 ms8.8 sec17 min 47 sec??????
Immutable.JS3 ms9 ms47 ms257 ms2.4 s41 s
Builtin Map37 µs180 µs2 ms18 ms170 ms3.4 s

Insertion into the map using Immer is unsurprisingly O(n) relative to the size of the map, making the whole test O(n²). Even though I had a hunch that this was going to be the case, it was worth checking that Immer wasn’t doing some other clever tricks under the hood to improve the performance.

I’ve never actually tested the performance of Immutable.JS before, so I’m quite pleased to see how well it scales. It seems only slightly worse than O(1) insertion time.

Both libraries are quite a bit slower than using the built-in Map class, so my conclusions are as follows:

  • Use Immer if readability and ease-of-use is more important than performance, which is often the case. But be aware that it’s not a simple process to switch to Immutable.JS later because of how different the API is.
  • Use Immutable.JS if you need high-performance persistent collections that scale well to large sizes, if it’s worth the decrease in readability, type safety, and increased verbosity of the code.
  • Use the mutable built-in collections for performance-critical code that doesn’t strictly need immutability for the algorithm.

TC39 is looking to add built-in immutable collections to the JavaScript standard (see the records and tuples proposal). I’m excited to see how those perform.

Distributed apps in JavaScript

Distributed apps in JavaScript

This post is different from my usual topic of Microvium. I’ve been recently frustrated with the way that distributed applications are written and I’ve been brainstorming ways it could be improved. In my last post on the topic, I suggested that maybe a new programming language would be useful for solving the problem, but I ended the post thinking about how the Microvium snapshotting paradigm might also be a suitable solution. This time, I’m going to consider a solution that doesn’t require Microvium.

A quick summary of the objectives

I want the experience of developing a distributed application to be basically as seamless as that of developing a single-process desktop application, such as the ability to get type checking between services, to debug-step directly from one service to another during development, to write regression tests that encompass multiple services, etc. And I want this all with minimal boilerplate.

My last post on the topic goes into more detail about the issues I have with the current state of affairs and the improvements that could be made.

A proposed solution

Here’s the summary: I think the Microvium snapshotting paradigm is indeed the answer, and I think we can in a sense “polyfill” the snapshotting ability in a limited context by deterministically replaying IO.

I’ll talk about this solution a few parts:

  1. What is snapshotting and how could we use it to solve the problem?
  2. How can we implement snapshotting on a modern JavaScript engine?
  3. Is it necessary to do this JavaScript?
  4. More details

What is snapshotting and how does it help?

I’m using the term “snapshotting” here to mean the ability to “hibernate” a process to a file and then resume it later on the same or a different machine. This means capturing to a file the full state of the process, such as the global variables, the stack and heap state, and machine registers. And the ability to restore the full state in a completely new context.

Normally applications are deployed as either a compiled executable or as source code depending on whether the programming language is compiled or interpreted. Having the snapshotting capability gives us a third option: run the application code at build-time and deploy a snapshot of its final initialized state.

Doing it the latter way allows application code to have pre-runtime effects, such as defining infrastructure, and for pre-runtime code to pass information seamlessly to runtime code, such as secrets and connection strings for the aforementioned infrastructure.

When I talk about coding in JavaScript, I’m really talking about coding in TypeScript, and so another you get with this paradigm is type-checking across different infrastructural components and type checking between infrastructure and runtime code. There are other solutions like Pulumi that allow you to write IaC code in the same language as your application code, but to my knowledge, there are none that allow you to pass state directly from IaC code to runtime code.

How to implement snapshotting?

Microvium is a JavaScript engine that implements snapshotting at the engine level, but Microvium doesn’t yet implement much of the JavaScript spec and its virtual machines are size-limited to 64kB. It’s in no way practical for writing a distributed cloud application.

Node on the other hand is used for cloud software all the time, and I’m a huge fan of it, but it doesn’t support snapshotting in the way described.

But I think we can emulate the snapshotting feature on Node by having a membrane around the application that records all IO while it runs at build-time, and silently replays the IO history as an initialization step at runtime to recreate the final, initialized state. Although this will be slower than just recovering a direct dump of the memory state (if the engine had supported that), in theory, it should work.

This will only be reliable if JavaScript is deterministic (that its state and behavior deterministically follow from its incoming IO), which it almost is already. Work by Agoric on SES and Compartments makes this even more so, by allowing the creation of a sandboxed environment for JavaScript modules, in which non-deterministic JavaScript library functions can be removed by default (e.g. Math.random and new Date), or replaced with deterministic implementations. They do this because they run JavaScript on the blockchain, where a collection of redundant nodes needs to process the same script and reach the exact same conclusion about the new state of the blockchain — a strong determinism requirement.

Does it need to be JavaScript?

Honestly, I can’t think of a way to do this outside of JavaScript. JavaScript has a number of key features that I think are really useful for this solution:

  • JavaScript is already almost completely deterministic. There are only a handful of non-deterministic operations like Math.random, and JavaScript’s flexibility allows us to easily remove these or replace them with deterministic proxies. Contrast this with C#, where it’s impossible to prevent some block of code from performing non-deterministic IO or having non-deterministic behavior by using threads.
  • JavaScript allows us to create perfect proxies and membranes. You can create proxies for classes and not just instances of the class, and you can create proxies for modules. In C#, there is no way to put a membrane around an assembly, namespace, or class.
  • Related to creating perfect membranes, JavaScript allows you to hook into all IO, including the importing a dependent module. This would allow the framework to trace the depenency tree of each individual service and create the appropriate bundles for each.

I think it would be possible to get some approximations of the idea in other programming languages, but it certainly appears to me that JavaScript is particularly well suited.

More Details

Most of the above has been quite abstract. I think it would be worth explaining the vision in a little more concrete detail.

I imagine there to be a thing, which I might call a “service”, which is the minimal distribution unit (we’ll say a service is “a thing that can be deployed”). A service might be a single microservice/lambda, or a database, or a whole distributed application made up of nested services.

A typically SaaS company using this solution would probably just have one root “service” that represents their whole distributed application, and that service would be composed of subservices1, which could each have their own subservices, etc.

A service is a JavaScript module (file), which runs at build time on the build machine (or dev machine), and then with snapshotting is moved to the runtime machine(s) when it’s finished its initialization (when all the top-level code has run, including its transitive function calls, etc).

While executing at build time, the service code is then able to configure its own runtime “hardware” that it will eventually be moved to. For example, it might call a host function to declare “I’d like to run as a lambda with automatic scaling”, etc. The host can record this information and provision the necessary resources before moving the service to its own desired runtime environment.

A special case of this general rule is that a service can choose a runtime manifestation that doesn’t support code execution at all. For example, a database, queue, or pub-sub system.

A service can instantiate other services at build time. Concretely, this might look something like:

var myService = importService('./my-service-code.js');

I expect this to be roughly analogous to just importing another module into the current service, similar to using node’s require() function. It will import and execute the given JS module (in the same machine process), and return an object representing the script’s exports. But with the distinction that new module is to be loaded inside a membrane, and the return value is therefore a proxy for the actual exports inside the membrane.

The membrane serves a few different purposes:

  • When the running services are “moved” to their runtime environments, services in different membranes will be on different physical runtime hardware. Service code may at build time configure its future runtime hardware.
  • At build time, the membrane records all IO exchanges such that it can deterministically replay the IO at runtime during initialization, to restore the service to its exact “snapshotted” state, as mentioned earlier. It can verify and silently absorb outgoing IO, and deterministically replay incoming IO.
  • At build time, services are allowed to pass around references to other services, as a kind of “dependency injection” phase. At runtime, the membranes around each service need to handle the marshalling of calls between services as they use these injected dependencies.

The importService function can construct a new Compartment which allows it virtualize the environment in which the service code runs. This can have a variety of effects:

  • We can provide a replayable variation of non-deterministic builtins, such as Math.random and new Date. These can be treated as a special case of IO. They can return real dates and random numbers at build time, as long as they return the same dates and random numbers when replayed at runtime.
  • We can provide service-specific APIs. For example, a function akin to pleaseLetMeRunAsALambda() could be injected into the globals for the service, or we could expose additional builtin-modules to the service code.
  • We can intercept module imports so that we can bundle the service with its module dependencies into a single package for deployment, without the overhead of bringing in dependencies used by other services.

Pulumi

I haven’t done much research on this yet, but Pulumi seems to provide an API for defining infrastructure in JavaScript code. I speculate that the Pulumi API could be brought in as a low-level build-time API for services to define things like “I want to be a lambda when I grow up”.

Conclusion

I’m fairly confident that something like this would work, and I think it would make for much cleaner code for distributed applications. I also don’t think it would take that long to implement, given that many of the pieces already available off-the-shelf. But even so, I probably shouldn’t get too distracted from Microvium.


  1. This kind of infinitely recursive hierarchical organization of the architecture is something I’ve seen missing from AWS and Azure 

Quality vs Quantity

Quality vs Quantity

It’s an age-old debate, but as I get older, my perspective is changing.

I feel increasingly frustrated by tools that don’t work smoothly; tools that weren’t thought out very well. And I ask myself, wouldn’t my life be better without these?

TL;DR The economic incentive for product development is not always aligned with what makes the world a better place. Tools that make people more productive do not necessarily make them happier. As creators, we should first-and-foremost create things that are pleasant to use, even when it’s not economically optimal for us or the companies we work for — economics should serve humankind, not be our master.


I’d be remiss creating such a post without whining about some concrete examples:

My Washing Machine

My washing machine1 has a “quick wash” setting that can do a load of washing in about 30min — great for those in a rush. And it allows you to further customize the quick wash with things like temperature and spin speed.

But for some unknown reason, it doesn’t allow you to select the maximum spin speed while in the “quick wash” setting!

If you want your clothes well-spun, you need to select another setting, the quickest of which is “whites”, and takes three-and-half-hours!

There’s a work-around, which I use most times — you run the machine on a quick-wash cycle (30 min), and then afterward change the setting to “whites” and then select the sub-setting “spin only”.

There are many other confusing and frustrating choices made for the interface and control of this machine, but it would take a whole post to go into them all!

How many extra man-hours would it have taken that company to properly think through the HMI2 design of the washing machine? How much extra would that have added to the price of each machine?

And here’s a deeper question: does the existence of this washing machine design make the world a better or worse place? If we could magically remove all washing machines with this design from history, would we be better or worse off?

What does it even mean to be better or worse off? This depends heavily on your world view. If you are a hedonist, then happiness/pleasure is the ultimate goal, and you can mentally compare the global happiness with or without this washing machine design in it, and make a value judgement about whether the washing machine is a net win or loss.

The difference between the two hypothetical worlds is subtle and complicated. The annoyance I feel when I do the washing is brief and small, but it’s multiplied by the thousands of people who have a similarly-designed machine.

And if we magically removed the design from the world timeline, we’d also be removing the satisfaction I felt when I discovered the work-around, and the catharsis of complaining about it in this blog post, or the social bonding it provides between people with the same issue who can enthusiastically affirm each other’s miserable experiences of such a damned machine.

But even with all these “benefits” of bad design, I honestly believe that we’d all be better off, in at least a small way, if this design didn’t exist.

So, what does this mean? Should humankind band together to create the perfect washing machine — a design which can be used for hundreds of years into the future, making people all of the world happy or even excited to do their washing?

In principle, I’d say “yes”. If we as humankind had collectively spent our efforts creating just one “great” washing machine design instead of a mix of good and bad designs across many companies, we’d probably all be better off overall.

The reality, of course, is more complicated. Different people’s ideas of the “perfect washing machine” are different. Technology makes this a moving target — the best washing machine 1000 years from now will be much better than the best we can do now, but we don’t expect a washing machine company to internally do a thousand years of technological research before creating their first product!

Adobe After Effects

Adobe After Effects is software for creating and editing video and animation. I learnt how to use After Effects this year so I could create little animated diagrams like that in my snapshot-vs-bundler post.

I find After Effects to be a really interesting example when it comes to armchair-philosophy about product quality. After Effects is, in my opinion, incredibly poorly thought out and frustrating to use. But it is also the only software that can do everything that it does, and using it makes you 100x more productive than if you were to try to do the same animations/effects manually.

I won’t go into a lot of specifics of what I think is bad design in After Effects, but I can’t help but mention one example: you can’t copy-paste something from one project to another (in fact, you can’t even open two projects at the same time). If you designed something great in one project and want to use it in the next (reuse, DRY), it’s a complicated and time-consuming process to export parts of one project into a temporary project and then import the temporary project as a nested child of the second project, from which you can now copy and paste.

You’re welcome to disagree with me and think that it’s a great design. The thought-experiment remains valid: in cases where a tool achieves something that was previously unachievable, or boosts their user’s productivity by some factor, is it better for the tool to have more capability or to be more pleasant to use?

The answer is not completely obvious, and it really got me thinking. My mind is in the weird paradox of simultaneously thinking that my life is worse off with After Effects while also I choose to use it because nothing else is available to do the job.

I wonder how many other people might also be in a similar position, perhaps frustrated by tools they use, but pushed to use them because the tools make their competitors more productive. It’s a prisoner’s dilemma situation (wiki).

50 years ago, nobody was complaining about the lack of After Effects software — they were fine with its absence — but today, we complain about its flaws. We’re “forced” to use it, either because our competitors in the space are using it or, in my case, because the mere knowledge of its existence makes work without it feel much slower, which itself is frustrating.

I honestly think the world would be a better place if After Effects didn’t exist and if knowledge of its absence also didn’t exist, nor a craving for something like it to exist. Or even better, the world would be a better place if After Effects was created with more design thought, making common workflows easy and pleasant, even at the expense of its variety of features.

Lessons

As a software engineer, I’m one of the lucky few who are actually contributing to the development and design of these kinds of tools. So what kind of lessons can I learn from some of my frustrations with other tools, like the washing machine or After Effects?

For me personally, the lesson is that quality is far more important than quantity. I think that humankind will be better off with fewer tools and less productivity, in exchange for tools that are more pleasant to work with. It’s better if technological development is slower, maybe 20 years behind where it would otherwise be, but the technologies developed are robust, dependable, and a pleasure to use.

In some ways, this goes against modern business practice. In our capitalistic world, there’s a temporary advantage gained for using a tool that makes the business more productive, even if it makes its employees less happy — the benefit is negated when the competitors do the same thing, removing the competitive advantage of the productivity boost and driving down overall prices of the product (but leaving employees still unhappy with the status quo).

Likewise, on the other side, it’s advantageous for a software company to sell tools that deliver the maximum economic value to their users, not necessarily maximum pleasure and peace. Remember that automating the repetitive parts of a user’s job does not necessarily make their life better, it just changes the nature of their work (now their job is about running the automation software). If a new piece of software allows users to complete their work in half the time, they don’t take the other half of their time to sit in the park and watch the birds play in the fountain!

If you create a tool that delivers more productivity in a way that’s more frustrating to use than the less-productive way, the machine of capitalism forces people to use your more-productive tool (which may make you rich in the process!) at the expense of increasing net frustration in the world.

Microvium

(Non-technical audiences may want to skip this section)

Microvium has parallels with After Effects, in that it’s a new tool that achieves something that was previously unachievable (it brings JavaScript to devices with only 16kB of flash). It boosts productivity significantly over alternatives such as EmbedVM.

So would it be ok if Microvium brought these benefits but was frustrating to use?

I think not. I’ve spent considerable effort in making Microvium as easy and pleasant to use as possible. For example:

  • I take great care in the full experience of integrating and using Microvium.
    • The one-liner install for node hosts, or the two-file copy-and-paste for C hosts (one h file for the interface, and one c file for the library).
    • The getting-started guide demonstrates this to the extreme, leading the user incrementally through all the concepts, building up to a full native host example.
    • The getting-started guide is even part of the regression tests — the automated test infrastructure literally extracts the example code from the getting-started markdown file and make sure it all still works. There’s little more frustrating than an outdated or incorrect introduction to a new tool.
  • I spent a great deal of effort making the FFI3 design super simple.
    • I hope that the way I designed it seems “obvious” and “intuitive” to people looking at it in hindsight — this is the way that designs should be. When you read through the getting-started guide, you probably don’t think at all about how good or bad the FFI system is. It’s hopefully invisible to the user, in the same way that a good washing machine should be invisible. It should just do its job, in the least-frustrating way possible.
    • Compare the FFI design of Microvium with that of Node.js for example, where one requires complicated binding files, additional configuration files (e.g. binding.gyp), the need to understand N-API vs NAN vs the V8 API, etc. In Microvium, you can export a JavaScript function to C code using a single line of JavaScript code and no configuration file is needed:
      vmExport(1234, sayHello);
  • Even for something like the module API for Microvium (for node.js hosts), which many people won’t encounter directly, I put a great deal of thought into making the model as conceptually simple as possible.
    • Compare to something like the Compartment API for JavaScript (under development), where the explanation of its module API necessitates the definition of 6 different kinds of “module specifier”, and the distinction between “resolving”, “loading”, “binding”, and “importing”. While Microvium does all of these, it doesn’t require the user to understand these concepts in order to use it.

This is to say that I’m practicing what I preach. However much I may fall short of the ideal, I put a lot of time and energy into making Microvium a pleasure to use and simple to understand, at the expense of not delivering as many features and it taking longer to develop.

I believe this was the right choice, and I’ll definitely apply it to other projects moving forward.

What are you optimizing for?

I’ll conclude by asking the question: what are you optimizing for?

If you are a manager or leader at a company, you may be inclined optimize the company to produce profits. You may even have a fiduciary duty to do so on behalf of your investors.

Sometimes the profit incentive is aligned with making good products: after all, happy customers are returning customers.

But clearly that’s not always the case — it’s common practice to release buggy software early, so you can iterate on it and use your users as a testbed to weed out the bugs. It’s common practice to “fail fast” in small startups — getting new ideas out the door quickly so you can establish their efficacy in the market, rather than spending significant time designing something that might be pleasant to use but nobody wants it because the idea was flawed. It’s common practice to aim for a first-mover advantage, especially in markets with a network effect, to trap people in your ecosystem before competitors (who may have a more pleasant product but were slower to get there) have a chance to penetrate the market.

Should fiduciary duty or duty as an employee transcend the betterment of mankind?

What goals are you optimizing your products for? What are you optimizing your company for? What are you optimizing your life for?


  1. Actually, my housemate’s washing machine 

  2. Human-machine interface 

  3. Foreign Function Interface — the way in which script code communicates host code and vice versa 

A Distributed Language

A Distributed Language

TL;DR I think it would be great to have a language where single instance of a program can run non-stop for 10 years across a dynamic, heterogeneous mix of hardware, where the program code can perform its own allocation and management of infrastructure resources such as databases, VMs, and public IP addresses, as well as allowing maintainers to safely hot-swap new code into the running program to update it.


This is a deviation from my normal topic — Microvium — to talk about a random idea1.

I recently got a new job at Flip Logistics. Like many businesses an engineer might get a job at, they offer a SaaS solution: a cloud-hosted application with a web front-end, some native apps for various devices, some public APIs for integration with 3rd parties, and backend storage and logic. So far, this aspect is the same as every company I’ve worked at, and I’m sure the same as many others.

Being a newcomer is a great time to see things with fresh eyes. In this post, I’ll go over some of the issues I see with the construction of SaaS software in general (none of this is specific to Flip in any way, and I don’t speak on behalf of Flip in this post), and propose a direction that may resolve a lot of the complexity.

The End Goal

I think the best way to express the issues I see with SaaS2 implementation is to contrast it to the implementation of a single native application. I believe that much of the implementation inefficiency/complexity of a cloud application comes from the fact that it’s distributed: it exists beyond a single OS process. A typical programming language only has first-class support for things pieces running within a single process.

Self-Contained

A native application is self-contained: it starts, it allocates all the resources it needs, including all the interfaces it uses to communicate to the outside environment (e.g. the user).

Contrast this with distributed applications where resources are imposed externally: virtual machines, databases, IP addresses, certificates, firewall rules, etc.

Imagine if simple applications worked this way: to start MS Word, you first need to go to your OS Dashboard where you instantiate a (blank) GUI window for MS Word to eventually use; then you instantiation a file for it work on; and reserve the amount of RAM you want it to run in, and the core of the processor, etc. Then you run your configuration manager to associate the MS Word “process” with the RAM, the file, the GUI window, etc. Then finally you boot the whole thing up and now you can edit your Word document.

Conversely, imagine a cloud “application” that you could just run (instantiate) by the equivalent of just “double-clicking” on it, and letting it allocate its own resources: VMs, network configuration, firewall rules, load balancing, etc.

Documentation, APIs, and Type Safety

In most application software I’ve written, little or no technical documentation needs to exist outside the source code itself. I’m a huge fan of self-documenting code, where the name and type signature of a function or class describes all you really care to know about what it does3.

The information in interface signatures is not only useful for humans, it’s useful because the type system can ensure that implementations and clients of the interface are in agreement about the contract that must be met. This is safe.

But this only applies to interfaces within the context of a single application. As soon as you enter the realm of distributed applications, you now have the issue of defining contracts between the individual distributed components. Much of the time, these are not type-checked, so producers and consumers have no guarantee that contacts are met.

These kinds of inter-service contacts are normally documented externally to the code, where it’s easy to have them get out of date. And hitting “navigate to definition” in your favorite IDE doesn’t take you from the client code to the server code where you can actually see the implementation of the API you might be invoking.

Here’s an example to illustrate the lack of type checking:
In C#, I can bring a queue into existence by new Queue<T>(). Or I could use a 3rd party queue library with similar syntax. The type checker enforces that the contract between the producer and consumer side of the queue. Contrast this with an AWS or Azure distributed application, where instantiating a queue service is done outside your program code and there is no type checking to verify the contract between producers and consumers of the queue, or to self-document the kind of things that consumers should expect to get out of the queue.

This isn’t a straightforward problem to solve. A good type system for distributed applications would probably need to account for tearing between the client and server, where the client and server are on different versions — so contracts defined by type signatures would need to span into the past and future.

Debugging

I’m not sure that any solution in the near future can solve this problem, but the debugging experience for a distributed application is severely lacking.

One area of issue is the fact that you cannot step across service boundaries. While debugging a client of an API, I can step into other functions within that client, but I cannot step from the client to the server code and see both the client and server in the same “call stack”.

Prior Solutions

I’m sure many people have tried to solve these kinds of problems before, to varying success. Two existing solutions off the top of my head:

ASP.NET Web Forms. Within the limited problem-space of a client web application and its backend server, Web Forms attempts to bridge the gap between client and server, having both in the same project and a fair amount of type safety between them. This only works for certain types of applications, and is far from a general solution to the issue.

Infrastructure as Code (IaC), in various shapes — this allows code to specify the so-called “infrastructure”, such as what VMs to create and how to route network traffic to them. This gets part of the way there, but it falls short of the ideal.

To go back to the MS Word analogy, IaC seems akin to having a script that you must run prior to running MS Word, so that the script can set up the “infrastructure” required for MS Word (reserving RAM, creating the file, instantiating the GUI, etc). This still seems unnatural, and it doesn’t give you type checking between the “infrastructure” and the other code.

My Solution

Starting at the end and working backwards, I want a solution where allocating a queue service is as easy as allocating a queue data structure. Something like:

// Instantiate a queue data structure in memory
let localQueue = new Queue<int>();
// Instantiate a queue service, which may in turn instantiate the VM it needs, etc
let infrastructureQueue = new QueueService<int>();

(I’m using a TypeScript-esque language here for my examples)

Similarly for a database:

let db = new DynamoDB<MySchema>({ ...options });

I don’t mean this to create a connection to a database, I mean that these actually create a database. The returned db variable would therefore be a local object representing the remote resource.

Application Lifetime

I also believe that the code that creates the database should be part of the application code. Just like in a simple native application, if a resource is needed by the application, the code at startup should acquire that resource (or it can be acquired on-demand when it’s later needed).

MS Word is instantiated when you double-click the icon (and this is when it acquires many of its resources, such as creating the GUI window), and it’s lifetime might be minutes, hours, or maybe days, before you shut it down and it reclaims all it’s resources.

The lifetime of a distributed cloud service is much longer — you might start it up in production one day, and then retire it 10 or 20 years later. If it allocates a database at startup, you would expect that database to be around for the full 20 years.

This should drive home a paradigm shift I’m trying to communicate: a distributed application, as I’m using the term, is not the thing we currently call a “service” or “microservice”. Rather, the code for the “distributed application” is something that might run for a decade, or a century.

An example is “the Google search engine”, which from the outside is just an application available at google.com which has been running continuously since 1998.

Today’s languages and application environments are simply not suited for this paradigm. We’re used to application code being immutable for the duration of the application: it needs to be right before you run the application. If you need to update the application code, you need to stop the application, and it releases all its resources to deploy the update.

When we move into the domain of distributed applications, and these resources include things like databases, message queues, load balancers, etc., we can’t afford to have the application tear them down when we “close” the application to update its code (nor can we afford to close the application to update its code).

But we may be heading towards a world where it’s possible to mutate a running application. An example that comes to mind with JavaScript is React-JS Hot Reloader (I think this video shows it in action). It is intended as a development tool: changes to your code files are detected when you hit “save”, and the changes are injected into the running application without stopping it. In particular, the in-memory state of your components and store are preserved.

I’m not suggesting that Hot Reloader is the solution here, but just that it shows that the paradigm is unfolding. It’s not unreasonable to think of a future where a single distributed program (single code project with a single entry point) can run for years on a cloud of physical machines and other hardware, allocating and deallocating resources according to the rules it defines in its code — such as making a new database for each client, and spinning up VMs according to its dynamic load4 — and allowing us to maintain it while it continues to run by hot-swapping code into it.

I think that likely the ultimate solution here involves a new language that is designed to handle this kind of thing. A language in which the capability of hot-swapping pieces of code is assumed from the start. A language with type-safety across a non-atomic release (patching different physical machines at different times). A language which can describe processes that run on unreliable hardware and with unreliable communication.

But possibly a lot of progress can also be made with existing languages. Maybe in future I’ll go into more detail about what that might look like.

Microvium

I know that this post isn’t about Microvium, but I want to tie this to Microvium as well, because there is certainly some overlap. I won’t dwell on it because Microvium was not designed to solve this problem.

The key feature of Microvium that has relevance here is the snapshotting feature. It plays with the relationship between code and physical hardware in a novel way:

  • A Microvium process (a running instance of a Microvium program) doesn’t run on just one machine. It is in some sense a “distributed” application because it runs first in one environment and then is suspended and moved to a different environment. In a typical use, it will first run on a server or development machine, and then resume execution on a client or microcontroller.
  • A Microvium process, and by extension the resource is it allocates (such as memory structures) can transcend the physical hardware on which it runs. For example, you can kill power to the machine, releasing all physical RAM, but yet the Microvium app can still have live objects in that powerless state. The program cannot make forward progress without some CPU, but because of the snapshotting feature, the machine (CPU and RAM) that continues execution is not necessarily the same that starts it — the program can jump between machines while keeping it’s live state.
  • A Microvium program can control its own compilation5, because compilation == snapshotting. I previously wrote about how snapshotting can supersede bundling and be superior to it. But looking at this more abstractly, we can think of this as internalizing an externally-imposed workflow, which is similar in principle to my proposal to bring IaC scripts into the program that runs on that infrastructure: giving a program its own control over the infrastructure and perhaps even the build process.

While Microvium may touch on some of the concepts required for a solution, just like Hot Reloading, it’s clearly not the complete picture.


  1. Although I actually do talk about Microvium at the bottom 

  2. I’m going to use the terms “SaaS application”, “cloud application” and “distributed application” interchangeably in this post, on the assumption that most large SaaS system are complicated because they run over distributed hardware 

  3. Or at least I strive toward this end goal 

  4. All by rules defined _in the program code_, or delegated to your favorite third party library, not necessarily baked into the platform 

  5. I’m using the term “compilation” here to mean compiling the source code to bytecode 

New Garbage Collector!

New Garbage Collector!

It’s only been a few weeks since the Alpha release of Microvium and I’ve already almost rewritten the engine! There are two major changes I wanted to make to improve performance:

  1. I’ve moved to a great new garbage collection algorithm which is much faster and has a smaller memory footprint in both RAM and codespace.
  2. I’ve changed the data model to allow Microvium values to embed native pointers on 16-bit platforms.

The Old Data Model

To understand the new data model, it will help if I first explain the old one, so you can see how they compare.

Microvium “slots” are 16-bit. The term “slot” is what I’m using to refer to a memory location that holds a value. For example, a variable is a “slot”, and an object property has a slot for its value (and another slot for its key), etc.

In the old data model, Microvium reserved the top 2 bits of the 16-bit value to be a tag, leaving a 14-bit space for data associated with that tag.

The 2-bit tag tells you how the 14-bits should be interpreted:

  • Tag 00₂: the data is a 14-bit signed integer
  • Tag 01₂: the data is a 14-bit offset into heap memory
  • Tag 10₂: the data is a 14-bit offset into data memory
  • Tag 11₂: the data is a 14-bit offset into the bytecode

Having values fit into 16-bits is convenient for the kind of device that Microvium targets: small microcontrollers with < 64kB of memory. Many of these are 16-bit architectures, because 16-bits is all you need to address all your memory, and moving to 32-bits increases the silicon size and power consumption.

14-bit integers are large enough for many purposes, such as counters, loop variables, indexes, byte manipulation, etc. For situations where more than 14 bits are required, Microvium has the above-mentioned “offset” values which point from the 16-bit slot to a larger memory allocation, somewhere in ROM or RAM. These larger memory allocations can hold any number of other value types, such as arrays, strings, 32-bit integers, function bytecode, etc.

Depending on the characteristics of the referenced data, they could be stored in either ROM or RAM. This design uses offsets rather than explicit pointers for 2 reasons:

  1. Not all devices can fit a pointer into 14-bits. In fact, most can’t.
  2. When a pointer is stored in ROM (i.e. in the bytecode image), it can’t know ahead of time what its address is (since the image could be placed anywhere in ROM).

Side point: what I’ve called “data memory” and “heap memory” here are both memory regions in RAM, but data memory is permanently allocated while heap memory is garbage collected. This is analogous to the difference between the following two allocations in C:

MyStruct myVariable1; // Permanently allocated
MyStruct* myVariable2 = malloc(sizeof(MyStruct)); // Could be freed later

This 16-bit data model seems pretty clever (or at least to me it did), but there are a number of issues with it and optimizations we can make. The first issue is that each of these respective offsets can only be 14-bit, giving only an addressable space of up to 16kB. On the whole, I don’t mind Microvium targetting use cases where applications can’t exceed 16kB, but it does feel awfully tight. This is compounded by the fact that the engine itself compiles to about 7kB1, so the engine is necessarily quite chunky relative to the largest bytecode programs it can run.

The other issues are easiest to see when comparing it to the new data model:

The New Data Model

The new data model uses a number of clever tricks to improve both performance and addressable size on 16-bit architectures.

In the new model, the tag is not a fixed size. It is 1 or 2 bits, and now in the lower position of the word rather than the upper.

If the lowest bit is a 0, the word is interpreted as a 16-bit pointer:

This makes the assumption then that pointers are always even, which is a good assumption since Microvium heap allocations are always 2-byte aligned. This means that once checking the low bit, we don’t need to do any bit manipulation to extract the pointer value: the value is already a 16-bit pointer.

This means that — at least on 16-bit architectures — a native pointer can be stored directly in a memory slot (variable) without any need for encoding and decoding. This speeds up a lot of things, but especially the garbage collector (which I’ll talk about in a moment). This also means that Microvium can now address 4x the amount of RAM than it could previously since it can address 64kB of space.

This speedup is quite significant. The heap in Microvium is not allocated contiguously from the host (doing so would make it difficult to expand the heap dynamically when more space is needed), so resolving a 14-bit heap offset, as per the old scheme, involved a function call with fairly complex logic to determine what heap fragment the offset belongs to, and to find the allocation being referenced. With the new scheme, this all goes away.

Bytecode (ROM) Pointers

Not everything can be addressed by 16-bits, even on a 16-bit architecture. In particular, ROM is not necessarily addressable with 16-bit pointers, since ROM may be larger than RAM, or it may be stored in a completely different address space (or even on an external device such as a serial flash).

Microvium intends to cater for these common situations, so the next kind of value that can be stored in a slot is as follows, having the lower 2 bits with a value of 01₂ and the upper 15 bits be an offset into the bytecode image:

Again, this makes use of the fact that bytecode addresses will be even (this was not necessarily true before, but I’ve made it true in this latest version). So to convert this value to an offset, we just have to shift right by 1 bit.

This only gives us up to 32kB of addressable bytecode ROM, but this is still pretty good. The bytecode file itself can be up to 64kB in size, but the ROM segment is limited to 32kB. I think this is more than acceptable for the kind of use cases that Microvium is currently intended for.

This leaves us with the last tag, 11₂, to represent integers:

We still only have signed 14-bit integers, like before. But even for these, this scheme is more efficient than the old! In the old scheme, to decode a 14-bit integer from the 16-bit word, we needed to perform a sign-extension, since the upper 2 bits (tag bits) were always zero, even when representing negative values. A 14-bit-to-16-bit sign extension typically involves multiple operations. In this new scheme, we need only perform a single signed-right-shift by 2 bits, and the sign will automatically be correct!

The New Garbage Collector

I like it when code gets smaller instead of bigger, and the new garbage collector (GC) is one of those cases. And with the new data model, the garbage collector is an order of magnitude faster than it used to be.

All the principles and objectives from my previous garbage-collection post still apply. In fact, I was well into the new design by the time I finished the implementation of the old GC, but I wanted to get the alpha release out before embarking wholesale on a major overhaul. This animation from last time summarizes the kind of environment that Microvium is meant to be operating in, where the circles represent different modules or tasks, and only one or a few at a time are expected to be active/scheduled, taking center stage and consuming their peak memory2:

The key takeaway from the aforementioned post is that the memory consumed by a task for only a short burst is typically significantly cheaper than the amount of memory it permanently reserves. This is because the average permanent memory per task is multiplied by the total number of tasks, while average peak memory per task is only multiplied by the number of simultaneously-active tasks running at their peak.

Single-threaded environments, of which I’m a huge fan, are even better for this since they enforce that there is only one “simultaneously-active” task — micro-tasks are scheduled serially and so each can consume all the resources for a short time, as long as they clean up thoroughly before the next microtask in the queue. If microtasks are kept short (e.g. less than 1ms) then responsiveness is still good and the system can still be “real-time-enough” for many real-world applications.

The new garbage collector is based on Cheney’s Algorithm (wiki). The general idea is this:

  • When it’s time to garbage collect, the GC finds reachable (live) allocations and moves them to a new virtual heap.
  • It then discards the old heap all at once.

The coolest new feature of the new Microvium GC is that it costs nothing at all to free an allocation! Dead allocations are simply never visited by this algorithm, and just get freed all in one go when the algorithm discards the old heap.

All living allocations have a cost during collection (this is true of all GC algorithms3 ). But also recall from my last post on GCs, that microcontrollers can have a 1000x more processing power than memory4, relative to desktop machines, so the fact that living allocations incur some (small) amount of processing time during every collection is probably not as impactful as the fact that living allocations continue to consume memory after the collection. In other words, don’t worry about how much CPU overhead a garbage collector has, and instead worry about designing your scripts such that their idle memory consumption is small, as you should be doing regardless.

This algorithm makes it fairly cheap to make tons of temporary allocations for intermediate computations or for passing data between functions, etc. And then when the task’s turn completes and the GC is run to clean up, these temporaries are discarded with no processing or memory overhead.

Since allocations in the Microvium heap are contiguous5, creating new allocations is also cheap — it’s almost just bumping a pointer forward6.

In the following animation, blocks represent allocations, and a block turns gray to represent that the allocation has become unreachable7 — blue blocks are live (reachable) allocations. The program8 can allocate quickly and cheaply by just bumping the allocation cursor/pointer forwards. Dead objects remain on the heap until collection. The collector then copies the live allocations to a new space and the old space is discarded in one fell swoop.

With the new data model, tracing the reachability graph is almost trivial: for each variable, check if the lowest bit is a zero. If it is, then the variable holds a 16-bit pointer to a reachable allocation. If not, the variable holds anything else — an integer or a pointer to ROM (which doesn’t need tracing). Reachable objects are copied into the new memory space and their internal slots are in turn investigated the same way.

Collection with this algorithm is quick and proceeds in a single pass with constant memory overhead — no mucking about with mark bits, free lists, work queues/stacks, etc.

Bonus: Array and Object compaction

As a bonus, this new implementation of the GC also compacts arrays and objects on the fly.

Dynamically-sized arrays consume more space than needed so that appending to them is quick: every time they run out of space, their capacity is doubled to make room for twice as many elements. The GC algorithm truncates the capacity of these arrays on the fly, so they use the smallest possible amount of space.

The case is similar for objects. Dynamic objects in Microvium are represented in memory by a linked-list of property cells (key-value pairs with a next pointer), so that appending new properties is quick. But the linked list consumes twice the amount of space of an equivalent contiguous array, so the GC compacts these linked lists into contiguous property arrays on the fly. This gives the best of both worlds, making it efficient to link a new property cell onto an existing object, but having that property cell subsumed into to the contiguous arrays during GC compaction so that each property takes the smallest possible space while dormant9.


  1. This is measured on an msp430, with floating-point support disabled. Presumably this number will grow over time 

  2. Read the full post for a more complete explanation 

  3. But can be mitigated on desktop/server machines by the use of generational garbage collectors, an unnecessary complexity for MCUs. 

  4. To put it another way, consider that many MCUs have the CPU power to loop through their entire RAM in the order of milliseconds, while desktop-class machines would do the same in the order of seconds 

  5. At least, contiguous within the large chunks of memory allocated from the host as the heap grows 

  6. Bumping the allocation pointer forward, checking for overflow, and assigning the 2-byte allocation header. 

  7. Unreachable means that there is no way for your program/script to access it anymore, so it is subject to garbage collection 

  8. The program is called the mutator in GC parlance. 

  9. A property in compacted form is 4 bytes: a 2-byte key and a 2-byte value. A property in linked-list form consumes extra bytes for the linked list pointers and the allocation header of each cell. 

Microvium Boost
It's like magic

Microvium Boost
It's like magic

TL;DR: This Microvium plugin (in development) optimizes Microvium bytecode by statically determining which variables and properties are accessible and how they might be accessed (read vs write), to decide whether to safely store them in ROM or remove them completely from the bytecode.


With the recent alpha release of Microvium, I’ve since turned my attention to a complimentary piece of the puzzle, which I’m calling Microvium Boost (the name may change). Microvium Boost might still be a few months from release, but some important and exciting parts are up and running, and I wanted to share some of the details here for anyone interested.

Microvium Boost is an optimization plugin for Microvium1. It hooks into the Microvium pipeline to optimize a snapshot of bytecode. As a recap, the snapshot is the file that you would download to target MCU; it’s a capture of the full running state of the Microvium virtual machine — see the Concepts page in the Microvium documentation.

Like a typical executable file, a bytecode file contains separate sections for data variables versus constants and functions. I’ve called these sections data and rom2 (these are closely analogous to the .data and .text segments produced by a C compiler). When a virtual machine is restored (loaded) on the target MCU, the data section is copied into RAM so the program can execute it, while the rom section is accessed directly from the bytecode image in ROM during execution (assuming you store the bytecode in ROM).

If I can summarize what Microvium Boost does in a single sentence, it’s this:

Microvium Boost determines the best memory section for each element in the snapshot bytecode, or if it can be completely discarded.

For some things, the best memory section is obvious. For example, function code is immutable and will always be stored in ROM. In the future, frozen objects could also probably be put straight into ROM without any analysis, but Microvium does not currently support freezing.

For all other variables and objects in the snapshot, we can store them in ROM if they are not going to ever be mutated at runtime3. It’s the job of Microvium Boost to determine this ahead of time through static analysis.

A related problem that Microvium Boost solves is whether an element in the snapshot needs to be kept at all. For example, it removes functions that are never called, and objects and variables that are never accessed.

This is a notoriously difficult problem to solve. Take a look at a previous post of mine where I demonstrated that Webpack’s static analysis for tree shaking is quite easy to fool into producing erroneous results.

The best way to show the kind of thing that Microvium Boost can do is by a few examples.

Examples

Moving values to ROM

The following is a simple example4. Here we have a script that creates an object with two properties and then exports a function to the host5 which returns one of those properties.

var obj = { // Stored in ROM by optimization
  x: 5,
  y: 6  // Removed by optimization
};

function run() {
  var temp = obj.y;  // Removed by optimization
  return obj.x;
}

// Export the `run` function to be called by the MCU host
exportValue(0, run); 

In the above case, Microvium Boost determines that object obj is not mutated (not just the variable, but the object itself), and that property y and variable6 temp are not used at all in a particular application.

It’s worth highlighting that var temp = obj.y does not count as a use of property y, since temp is not used. What counts as “usage” is determined lazily, with only those values used in I/O being real usage7, and all other intermediate values only being tentatively needed in case they aid in the computation of I/O.

Built-in Functions and Objects

There is another optimization that Microvium Boost does in the previous example. There are actually a number of built-in library functions that are also culled from the bytecode because they aren’t used (for example, Array.push is not used anywhere here).

This is quite an important feature of Microvium Boost: the ability to have a rich common library of useful functions that any particular script might employ, and have the unused parts of it dropped from the snapshot on a case-by-case basis depending on usage.

Function Parameters

Here’s another interesting example, with the optimization results noted a comments:

var w = 1; // Removed by optimization
var x = 2;
var y = 3;
var z = 4; // Removed by optimization

function run() {
  // Arguments `w` and `z` are removed by optimization
  var result = bar(w, x, y, z); 
  return result;
}

// Parameters `a`, `d` and `e` are removed by optimization
function bar(a, b, c, d, e, f) {
  return b + c + f;
}

exportValue(0, run);

These optimization opportunities might seem obvious when you read the code, but it’s actually quite a difficult problem to solve in the face of multiple possible callers and ambiguity in the function target. To highlight this complexity, consider if we changed run to have the following implementation, which also optimizes just fine with Microvium Boost:

function run(b) {
  var f;

  if (b) f = bar;
  else f = foo; // (Assuming that we define a function `foo`)

  // Arguments `w` and `z` are removed by optimization
  var result = f(w, x, y, z); 
  return result;
}

In this example, the call f(w, x, y, z) might either be calling bar or foo, and it’s impossible to know ahead of time which it is because it depends on the value b passed from the host. If we (the optimizer) remove a parameter from bar, we equally need to remove it from foo so that the parameters for f are consistent. And in general, bar and foo might similarly be called from multiple call sites in the program, so removing a parameter from bar may have implications that affect calls to foo elsewhere.

How does it work?

Microvium Boost is a whole-program optimizer that works differently to any other optimizer I know of. I’ve been working on the concepts behind the algorithm for some years now — they are borrowed from the type inference algorithm for MetalScript.

Naturally, an algorithm like this is beyond what I can explain properly in a blog post, but here is the general idea.

There are two steps to the algorithm:

  1. Determine the dependency relationship between elements of the snapshot (i.e. the program), by stepping sequentially through the source code8. For example, a call operation is related to the corresponding function(s) it calls, such that both have consistent expectations about the parameters to be passed.
  2. Use the relationships to determine consistent facts about the snapshot. For example, to decide whether a particular parameter can be culled from a particular function signature.

The following diagram is the resulting dependency graph for the “Function Parameters” example earlier. I’ve left the labels off the graph just to give a general sense of it9.

In simplified terms, each node in the graph corresponds to an element in the snapshot, such as an instruction, parameter, property, or global variable. The arrows are the inferred directional relationships between these elements. For example, an instruction which reads a global variable is related to the variable that it reads (if the former is not culled, the latter cannot be culled).

Nodes colored solid blue are those determined to be required in the final, optimized snapshot, while the empty circles represent elements that can be culled (much of the culled program is off-screen).

In this particular diagram, as annotated below, the solid-blue root node on the left is the particular IL instruction that returns control to the host: it passes the final value to the host (it is a point of output IO), and so the instruction that generates result cannot be culled from the snapshot. The dependency graph then pulls in a cascade of other operations and parameters that are transitively required in order to produce the single return value, ending on the far right of the graph with the globals variables x and y, which are the ultimate leaf dependencies required to compute the aforementioned output return value to the host. The rest of the nodes are not too important to this discussion at a high level except that they connect the result to the variables x and y from which the result is ultimately derived.

Here’s the corresponding source code again for reference:

var w = 1; // Removed by optimization
var x = 2;
var y = 3;
var z = 4; // Removed by optimization

function run() {
  // Arguments `w` and `z` are removed by optimization
  var result = bar(w, x, y, z); 
  return result;
}

// Parameters `a`, `d` and `e` are removed by optimization
function bar(a, b, c, d, e, f) {
  return b + c + f;
}

exportValue(0, run);

If you changed the source code by removing return result, indeed Microvium Boost will now safely cull global variables x and y, since that whole island of the dependency graph is only anchored by the data required by return operation.

It’s Magic

I subtitled this post “It’s Magic” because of Microvium Boost’s ability to deal with dynamic information, which seems almost impossible to me, despite the fact that I wrote the code for it!

We could make the example a bit more dynamic to illustrate this by making the call to bar indirect, as in the following code:

var w = 1;
var x = 2;
var y = 3;
var z = 4;

function run() {
  var result = call(bar, w, x, y, z);
  return result;
}

function call(func, arg1, arg2, arg3, arg4) {
  return func(arg1, arg2, arg3, arg4);
}

function bar(a, b, c, d, e, f) {
  return b + c + f;
}

exportValue(0, run);

The inferred dependency graph for this looks roughly the same, with an extra layer of nodes between the root return operation and the leaf global variables from which the returned information ultimately feeds. And just like before, it determines that w and z are not used, along with the corresponding arguments to call and bar.

Why Microvium Boost?

This kind of analysis is expensive to compute, but I think it’s well suited to the kind of scenario that Microvium is targeting, where the machine performing the optimization is thousands of times more powerful than the microcontroller device on which the optimized bytecode will run; where every byte and every instruction counts.

I think Microvium Boost enables a style of script programming that was previously infeasible for these kinds of situations.


  1. Microvium Boost is not part of the open-source Microvium codebase 

  2. There is actually a third section in Microvium which you don’t see in a C compiler, which is called gc and is the initial state of the heap in the same way that data is the initial state of the global variables. The principles that apply to data extend to gc 

  3. I believe the XS engine from Moddable takes a different approach, which is to store everything in ROM *until* it’s mutated at runtime, but this requires extra runtime overhead and indirection, and I didn’t choose this approach with Microvium 

  4. All these are working examples, mostly copied from the Microvium Boost automated test cases 

  5. The host of a typical Microvum program will be the surrounding C firmware on which the VM runs. 

  6. My choice of var over let or const for these examples is for readers who aren’t familiar with JavaScript, and may mistake the meaning of const or be confused by the name let

  7. Values used to discriminate control paths are counted as a special case of I/O, since most control paths eventually lead back to the host. 

  8. Actually, by stepping through the instructions in the intermediate language (IL)  

  9. Just ask me if you want to see the full, labeled graph. 

Microvium Alpha Release!

Microvium Alpha Release!

Microvium version 0.0.9 is published on npm, marking the first alpha release of Microvium! ?

I’m pleased with how quickly this has come together in the 4 months since I started the project back in February. Be ready for good things still to come! Subscribe to my blog if you want to be notified of new developments.

Please feel free to take a look, play around with it, and let me know your thoughts:

Here I have it running on a 16-bit device with 16 kB of ROM1 and 2 kB of RAM. This is roughly the scale of device that Microvium is primarily focused on.

The basic functionality is complete, but be aware that it’s not intended for production systems as of yet — I’m calling this an “alpha” release because the functionality is present but the testing is not complete and there are likely a few inevitable wrinkles still to be surfaced and ironed out.


  1. This device actually has 16 kB of FRAM, not flash, but I’m not using the read-write capabilities of the FRAM in this example. 

Microvium Garbage Collector

Microvium Garbage Collector

The Microvium garbage collector is here!

The garbage collector handles the automatic freeing of unused heap memory in the VM.

Microvium makes some interesting and perhaps unique tradeoffs with garbage collection which I’ll focus on in this post.

I’ll divide this post into two parts to cater for different audiences: I’ll start with a summary of the features at a high level, for those who don’t care about the details and design decisions, and then I’ll go into more detail.

Summary of Features

In no particular order, here are some features of the Microvium memory management system:

  1. Allocations are dynamically-sized, each having a 2-byte allocation header.
  2. Allocation headers are optional in some circumstances, so allocations can be even more compact1
  3. Allocations can be as small as 2 bytes2. For example, single-character strings are stored as 2-byte allocations, having a character3 and an implicit null terminator.
  4. The garbage collector is compacting (wiki: Mark-Compact). So, there is no heap fragmentation.
  5. Allocating on the heap is cheap and is constant-time — there is no searching through free lists.
  6. Data structures used for garbage collection are allocated only during garbage collection — no structures are persisted while the VM idle.
  7. Memory is acquired from the host in chunks as-needed, so users don’t need to decide ahead of time how much memory they want to commit to a particular VM.

I’ll add in two limitations for completeness:

  1. The memory for a VM is currently limited to 16 kB
  2. A VM temporarily uses about twice its memory during the garbage collection phase.

Design Considerations

Microvium is for MCUs

While Microvium can and does run fine on desktop-class machines, it’s optimized primarily for microcontrollers. Microcontrollers have quite different characteristics that influenced the design decisions of the garbage collector (GC):

RAM is small relative to processing power.
A 3 GHz desktop computer might have 16 GB of RAM, while a 3 MHz MCU might have 16 kB of RAM — a thousand times less processing power but a million times smaller RAM (obviously this is a very rough calculation to illustrate a point). This is relevant to the GC design because it should be heavily biased towards having a small RAM footprint, since RAM is relatively more expensive than CPU on a microcontroller.

There is no separate CPU cache
Most MCUs I’ve dealt with have no cache — all memory is accessed with constant time4, and often on a single instruction cycle. This affects the design of a GC because things like locality, order-of-access, and prefetching are not factors that need to be considered.

Microvium expects small allocations

Microvium is optimized for 16-bit platforms. Its native value type is a constant-sized 16-bit dynamically-typed value. Anything that doesn’t fit in this small size needs to be heap-allocated, including strings, 32-bit integers5, floats, arrays, objects, etc.

Since Microvium expects these kinds of values to be frequently allocated on the heap instead of the stack, the heap is designed to be fast to allocate onto, allocations shouldn’t incur a lot of memory overhead, and allocations shouldn’t require a lot of padding.

For this reason, Microvium allocations are only 2-byte aligned, so that the most padding ever required will be 1 byte, and only for odd-sized allocations.

Microvium is not the main show

Another significant design consideration is that a Microvium program is not expected to typically be the main program running on a device. Rather, it is optimized for situations where there is an existing firmware application and a Microvium script is only consulted occasionally. The rest of the time, the script is “dormant” and should occupy as few resources as possible.

I think this point is important, so let me explain in a bit more detail…

The way I visualize this design consideration is to think that any one Microvium program is only one out of a number of “tasks” or “components” that exist in the firmware application as a whole. It doesn’t matter whether the other components of the application are other Microvium programs or are C modules — the design considerations are the same.

For argument’s sake, let’s say that some application has 12 components. In the following diagram, I’m representing the RAM space as the dark circle, and each of the colored circles is a “component” occupying RAM, and the space in the middle is vacant memory. Let’s say that each component in this diagram is dormant, but is designed to become active in a particular scenario.

You might think of these as 12 “modes” or “activities”. To use a concrete example, consider an application with BlueTooth support, where one of these 12 components is the part of the firmware that handles BlueTooth communication. Most of the time, there’s no BlueTooth activity, and so the component of the application which handles BlueTooth communication is dormant.

When there is some Bluetooth activity, the Bluetooth component can take the stage for a short time to do its work, and then afterward, pack itself up and go back to it’s idle/dormant state. While the component is active, it will likely consume more memory, as it juggles BlueTooth messages in memory, or whatever else it needs to perform its activities. We hope that in most cases, when it’s done processing the BlueTooth event, it can pack up and release memory before waiting dormant until the next event. While it’s packed up, the “stage” is clear (there is free memory) for other components wake up and do what they need to do to respond to other events as they come.

This is the kind of situation that Microvium has been optimized for.

Why do I bring this up?

In this model, the memory that an average component uses while dormant is 12 times more costly than the additional memory it uses while active, because the average dormant memory is multiplied by the number of dormant components, while in this model, there is only one active component at a time.

While this example is an oversimplification, I think it still illustrates the point I’m making: for applications with dynamic memory allocation, memory held by a component for a long time is typically more expensive than memory held for a short time. Microvium can be used in situations where this is not the case, but it’s optimized for situations where this is the case.

This is important when it comes to GC design: a VM that incurs a lot of memory overhead for a very short time is considered better than one that has much less memory overhead but maintains it forever.

What kind of memory overhead am I talking about? Here are some examples:

  • Some GC algorithms store extra data in the allocation headers6. This is permanent overhead proportional to the dormant heap size of the VM, and so is expensive. Microvium does not do this.
  • Some managed heaps can get fragmented. Even if fragmentation only caused the VM to consume an additional 20% dormant memory, by our above example, this is roughly equivalent to consuming 20%*12 = 240% additional peak memory. So fragmentation is expensive. Microvium uses a compacting collector to fully eliminate fragmentation; to “squeeze” out all the unused spaces before a VM goes dormant.

On the other hand, the Microvium garbage collector uses quite a lot of memory while collecting. In fact, during a typical collection, the amount of memory consumed by the VM may be double for a brief time.

This is because the Microvium collector is actually a semispace collector. At the time of collection, it allocates a whole new heap for the VM, and then copies reachable objects from the old heap to the new one, before discarding the old heap.

The above design considerations are critical to understanding this design choice: for situations where a Microvium script is a small component in a larger firmware system, and is dormant during most of the firmware’s activities, the overwhelming metric to optimize is how well the VM can be “packed up” while it’s dormant, and Microvium achieves this with great success.


  1. Property cells, for example, are 6 bytes each and do not use an allocation header 

  2. Excluding the allocation header 

  3. Strings are utf-8 encoded 

  4. For internal SRAM 

  5. Or any integer larger than 14-bits 

  6. Such as mark bits and forwarding pointers 

Microvium Status Update
June 2020

Microvium Status Update
June 2020

I’ve made a lot of progress on Microvium over the past month and will share some of the details here for those who are interested. I think it’s really close to being ready for an alpha release, with the garbage collector being the only remaining piece of work before it can start being used.

Summary

In no particular order, here are some of the recent developments, which I’ll discuss in more detail below for those who are interested.

  • Dynamically-sized arrays!
  • Bytecode disassembly (now you can see what bytecode Microvium is producing!)
  • Code coverage analysis of the native engine implementation
  • Control over integer overflow behavior

Additionally, Microvium now has a lot more “flesh” than it used to — a wide variety of scripts now run and compile correctly which didn’t a month ago, making the engine usable for real-world scripts.

For readers interested in the details, read on! For everyone else, thanks for stopping by, and see you next time!

Dynamic Arrays

Microvium now has the much-needed support for dynamically-sized arrays.

Arrays are simple to create. For example, the following code creates an array of 3 numbers:

arr = [1, 2, 3];

As with full JavaScript, you can change the length of the array simply by assigning to indexes beyond the current upper bound of the array, for example arr[3] = 4 to extend the array to 4 elements. Nice and simple. (You can also assign to the length property of the array, or call arr.push)

Arrays are implemented internally using a structure like the following diagram, where each square is a 2-byte word (this example array consumes 20 bytes of memory to hold 4 numbers):

An array consumes two allocations on the heap: one which has the metadata describing the array, and another allocation for the array elements, with the former pointing to the latter.

Originally, I was planning to use a single allocation, but the issue is that when arrays grow, they need to be reallocated, which involves updating all the pointers to them. This is possible, and I was brainstorming some ways of doing so, but in the end the cleaner solution seemed to be the one above with two distinct allocations, and only one of them needing to grow.

To make the growth of arrays efficient, arrays have both a “capacity” and a “length”. The capacity of an array is the number of available slots in memory before the array will need to be reallocated. The capacity is invisible to the user. The length is the .length property of arrays and accessible to the user.

For array literals, the allocated capacity is exactly the length of the literal array. For example, the literal [1, 2, 3] has both a length and a capacity of 3. Thereafter, every time an array’s capacity needs to grow, it will be reallocated with twice the previous capacity. In our previous example, extending the array from 3 to 4 elements causes the capacity to go from 3 to 6, resulting in 2 words of padding (the 2 gray squares in the diagram). Until adding a 7th element (or higher index), the capacity would not need to grow again.

Advanced readers may observe that the memory structure here has no space for non-index properties. This is an intended design limitation of Microvium, that you can’t add arbitrary properties to arrays as you can in JavaScript. For example, you can’t say arr.p = someValue. I think this is totally acceptable for most applications.

I’m quite happy with this design. Initially, the 8-byte (4-word) overhead really plagued me, but I’ve come to terms with it, and certainly, this design is very clean.

Bytecode Disassembly

A really great feature that I added to Microvium recently is the ability to decode snapshot bytecode and, in particular, to generate disassembly for bytecode files. This is great because it allows us to really get a close look at the structure of a snapshot bytecode file.

Let’s say we have the following script:

vmExport(42, run);
function run() {
  console.log('Hello, World');
}

We can get a bytecode disassembly for this by running the following command in the terminal:

microvium microvium script.js --map-file bytecode.txt 

I called it a “map-file” because it creates a human-readable map of the bytecode. You can see the generated output here if you’re interested.

The main thing is that it allows us to see byte-for-byte what’s occupying a bytecode file. Remember that these bytecode files fully describe the virtual machine state at the time the snapshot was taken, including all the variables, functions, and heap state. This is one of reasons I developed this feature: so I can see inside the working state of a virtual machine, taking snapshots at various points to see a complete breakdown of memory usage.

I think this feature will also help to demystify snapshots for end users, since it makes it easy to see what’s inside them.

Code Coverage Analysis

One of the concerns that was plaguing the back of my mind was that I had done a lot of development work for the native engine but had little idea how much of it was tested and working correctly. For something as critical as a virtual machine, this was unacceptable.

I thought that something like code coverage would probably resolve this. I have a test suite with a bunch of self-testing Microvium scripts, and it would be great for some off-the-shelf tool to tell me how much of my C code was exercised by these tests.

Unfortunately, I couldn’t find any off-the-shelf solutions that were adequate while not being out of my price range. But the solution I came up with is even better, if I may say so myself!

It’s not the prettiest, but you get used to the syntax quite quickly. Here’s a sample from the C code that shows what I’m talking about. This is the code that handles the ObjectSet opcode, which is for setting a property on an object. Take a look at the CODE_COVERAGE macro calls:

Basically, I’ve generously sprinkled these CODE_COVERAGE markers all over the code. When running in production, these marker macros don’t expand to anything and so don’t consume resources. When running in the testing environment, these macros expand to function calls which record that each marker is hit at runtime.

But wait! That’s not all!

I have a script which parses the C code in search of these markers, and automatically fills in information about each:

  • It automatically fills in the unique identifier for each marker (in the selected line in the example, this is 124). I used identifiers instead of just relying on line numbers because they remain consistent during code movement. The script automatically cleans up duplicate or missing identifiers.
  • It automatically appends the // Hit or // Not hit comment, depending on whether the marker is hit or not during the tests (and you can see in the screen capture that I’ve tweaked my IDE to colorize these distinctly).

But even better than typical code coverage tests, we extend the idea to test for value cases as well. For example, there is a bytecode opcode for loading some common literal values, and there are 8 different possible literals that it can produce, such as null, undefined, true, false, etc. (the exact values are not relevant to the point I’m about to make). These options might have been implemented in a switch statement (handling the null case, the true case, etc), in which case a CODE_COVERAGE marker on each switch case would tell us whether the particular literal value has been tested. However, it was more efficient to implement this as a table of literal values, with a simple lookup. To get test coverage data for this kind of implementation, I created a different macro which I call TABLE_COVERAGE:

The details are not important. What’s relevant here is that the test runner has identified that there are 8 values in the lookup table, and that the code is hitting all 8 of them (see the highlighted portion of the image above). This is really valuable information, to know that there exists some code in the test suite which is exercising all these options, even though they don’t appear as distinct code paths.

Integer Overflow Control

Microvium is all about being micro — having a small footprint in terms of RAM and ROM. When it comes to squeezing the most out of the ROM footprint of the engine itself, the overflow behavior of numbers in C and JavaScript is frustrating, to say the least.

In JavaScript, all number operations are conceptually performed on 64-bit floating point numbers. In a C implementation of these semantics, we don’t need to store the full 64-bit float. The Microvium engine represents the number 1.0 with a 14-bit integer and the number value 2_000_000_000 (2 billion) with a 32-bit integer in memory. It’s not until you exceed the signed 32-bit range that Microvium needs to resort to the full floating-point representation in memory, and expectation is that this will almost never happen in practice.

However, the challenge is that most operations on numbers theoretically need to check for overflow in order to correctly do the promotion from int to float in the rare case it occurs, and this takes up precious program space as well as degrading performance.

To make this more concrete with an example, consider that the smallest signed 32-bit integer is -0x80000000. If we negate this using the operator -x, in JavaScript we should get the number 0x80000000, but this is no longer within the signed 32-bit number space, and so in C, we would consider this to be an overflow. The overflow in C is technically undefined behavior, but in practice, -0x80000000 will almost always give us exactly what we started with, which is -0x80000000.

Let me say that again: in C, -(-0x80000000) == -0x80000000, when dealing with signed 32-bit integers.

This is the only integer in C that can overflow an int32 during negation (-). To implement the correct JavaScript semantics in Microvium, we need to have a test for exactly this case.

To make things worse, C does not provide any standard way of checking for overflow. GCC provides some language extensions for such, but a cross-platform engine can’t assume they exist.

If this were the only overflow case, then a few extra checks and branches wouldn’t be worth worrying about. But the reality is that many or most number operations can overflow, and they sometimes do so in cases or ways that you wouldn’t necessarily expect or care about.

I’ll briefly give 3 more examples to drive home the point:

Negating Zero

I said above that 0x80000000 is the only integer that can overflow in negation in C. But in Microvium, this isn’t true. In particular, in Microvium, negative zero is not an integer, because it is distinct from positive zero. The behavior of Microvium in the case of negative zero is best illustrated by the corresponding test case:

  // Without overflow checks enabled, the negation of integer zero is integer zero
  assertEqual(8 / -(1-1), overflowChecks ? -Infinity : Infinity);

As you can see above, if overflow checks are enabled, then negating integer zero results in the floating point -0, which when used in the denominator above will give us the answer -Infinity.

But if overflow checks are disabled, then the result of negating zero is still zero, as most C programmers would probably expect.

The reason for the (1-1) gymnastics is because -0 is treated as a literal with value negative zero, whereas -(1-1) is treated as the negating operator acting on integer zero.

Side note: You might think that division by any kind of zero is an overflow, but in Microvium, the naked division operator / is always a floating-point operator. There is a separate integer division opcode in Microvium which is syntactically represented as x / y | 0. I might discuss this more at a later stage.

Integer Addition

We all know that integer addition can overflow, but I bring this up as an example of how hard it can be to test for integer overflow. After a lot of research, the fastest way I could find to check for overflow on integer addition seemed to be presented in a comment on this blog post. Basically, if a = b + c, then overflow can be tested as:

if ( (a ^ c) & (b ^ c) ) < 0) /* overflow */;

On a 16-bit machine, doing four 32-bit operations and a branch is not the cheapest, when all you wanted to do was add two numbers together.

To make matters worse, this particular solution is determining overflow retroactively, which is technically undefined behavior. But pragmatically, I expect this to work in an overwhelming number of real-world C environments, so I’m using it anyway.

Right Shift

In JavaScript, the left and right shift operators (<<, >> and >>>) always yield a result that fits into the signed 32-bit range, except for one case, which is zero-fill right shift by zero (>>> 0) of a negative number . Take a look at these examples, and see if you can spot the odd one out (hint: it’s the last one!):

// For positive numbers
1 | 0   // = 1          Bitwise OR
1 & 1   // = 1          Bitwise AND
1 << 0  // = 1          Bitwise left shift
1 >> 0  // = 1          Bitwise sign-propagating right shift
1 >>> 0 // = 1          Bitwise zero-fill right shift

// For negative numbers
-1 | 0   // = -1          Bitwise OR
-1 & -1  // = -1          Bitwise AND
-1 << 0  // = -1          Bitwise left shift
-1 >> 0  // = -1          Bitwise sign-propagating right shift
-1 >>> 0 // = 0xFFFFFFFF  Bitwise zero-fill right shift

Since all other cases yield correct results when interpreting the result as if cast to a signed 32-bit integer, Microvium treats the last case an overflow. If overflow checks are disabled, the result of the last case will be -1 instead of 0xFFFFFFFF.

I can understand the reasoning behind the defined behavior in JavaScript, since doing a zero-fill right shift then always consistently produces an unsigned result, no matter how much you’re shifting by. But it breaks my intuition about how bitwise operators work in JavaScript. Why does 0xFFFFFFFF | 0 not also produce 0xFFFFFFFF (it produces -1)? Why is it such that the result of most bitwise operators is interpreted as if the 32nd bit was a sign bit, even when the operator has no special understanding of signedness? And then why break that rule with the zero-fill right shift, which is equally just a bit manipulation operation like bitwise AND and OR?

In Microvium overflow behavior is configurable

My solution in Microvium, if you haven’t already figured it out from some of these examples, is to make overflow checks optional. It’s a compilation flag presented as a #define in the port file, which is accessible specifically to the unit tests for reflection so it can determine the expected result of various operations.