Category: Software Design

Microvium has Classes!

Microvium has Classes!

TL;DR: Microvium now has support for classes. Here are some details for the curious.


I’ve been putting off adding support for classes in Microvium because you can do a lot without them, and other features have been more important in my opinion. I’m also personally more of a fan of functional-style programming rather than object-orientated programming, so I seldom write classes myself.

But I kept running into little things that are difficult to do without support for classes. With the recent addition of try-catch, for example, you could now throw, but you couldn’t throw new Error(...) because Error is a class. Another example is the ECMA-419 API for embedded systems, which is an API standard for things like GPIO, but which relies heavily on support for classes.

Microvium now has support for classes, which I think makes it the only JavaScript engine under 100kB of flash that has all the features required to implement ECMA-419 (and Microvium is currently under 10kB at the moment).

Microvium functions are not objects

Classes in JavaScript are more or less just syntactic sugar for old-style prototype-based code. For example, the following are more or less equivalent in normal JavaScript (but the prototype style doesn’t work in Microvium):

// ES6 class style
class Foo {
  bar() { console.log('Method bar called') } 
}

// Old prototype-style
function Foo() {}
Foo.prototype.bar = function() { console.log('Method bar called') }

Both styles are setting up the Foo.prototype object, but with the class syntax, it is implicit. Then if you evaluate new Foo(), the engine will create a new object whose __protype__ is Foo.prototype.

This is all possible because functions in JavaScript are also objects, and can hold properties like any other object.

But Microvium functions do not support properties, so the function Foo cannot have a prototype property (or any other static class properties like Foo.myStaticProperty).

I made the choice to omit this feature from the engine because this pattern is relatively uncommon outside of classes. It would be a massive waste of RAM to have every function also be an object. A closure function in Microvium currently uses only uses 6 bytes of RAM, and most functions will use no RAM at all if the static analysis determines them not to be closures (e.g. all top-level functions). Whereas a pair of normal objects like { prototype: {} } is already 16 bytes of RAM (6 bytes for each object plus 4 bytes for each property). It would be madness to require that every function declaration uses 16 bytes of RAM just in case someone wants to use it as a constructor.

So how do classes work in Microvium?

The way it works is quite simple: when you use the class syntax, Microvium creates a function that supports properties (i.e. “a class”), but when you use the normal function syntax, it will keep its old behavior.

A class in Microvium is a distinct type. Although typeof MyClass will return "function", as per the JS spec, if you use the C function mvm_typeOf to probe the type, it will tell you VM_T_CLASS not VM_T_FUNCTION.

Internally, a class is a tuple of a props object and constructor function, as shown in the following diagram. When you access the class like an object (e.g. MyClass.myStaticProperty or MyClass.prototype), it delegates the property-access operation to the props object.

Memory layout for a minimal empty class and an instantiation of the class

When you construct an instance of the class using new MyClass(), it creates a new object whose prototype (__proto__) is props.prototype, and then calls the constructor function with the new object as the this parameter.

The constructor function itself could just be a bytecode function in ROM, but it can, in general, be any function because it delegates the call to the normal calling machinery of Microvium. For example, if you have a class declaration nested inside a function then the constructor will be a closure, with access to the locals of the surrounding function. Or the constructor can be an imported host function. But the simplest case is a class declared at the top level of the code, in which case the constructor part of the tuple just points directly to a function in the bytecode ROM image.

The key-value pair in the above diagram (where the key is "prototype") is how Microvium implements object properties. Each property is a 4-byte pair containing a 2-byte property key and a 2-byte value. Here I’m only assuming one property (MyClass.prototype) but other properties would follow contiguously if there were more.

The next field is unimportant for this example. When you add properties to an object, they’re actually added using a linked list rather than resizing the object structure (for CPU efficiency) but then the garbage collector compacts these linked lists into a contiguous form (for memory efficiency).

Classes and objects are expensive

Classes (and objects) are one of the more expensive features of Microvium. The above minimal/empty class is already 22 bytes of RAM. While each instance of the class is only 6 bytes of RAM, every property on an instance is an additional 4 bytes. So an object with just 2 properties is already 14 bytes.

Property lookup is also quite expensive:

  • It is linear-time because it has to search through the property list.
  • Many different things support property lookup, such as arrays, classes, buffers, and of course objects. The lookup code needs to figure out what it is in order to know how to find its properties.
  • Properties use string keys. For hard-coded literal strings, like x.y or x['y'], these strings don’t incur RAM overhead but they do add to the bytecode size.
  • For computed string properties like x['a' + 'b'], there is additional overhead to perform string interning — string interning is the process of consolidating different strings with the same content so that they also have the same reference identity, which makes property lookups more efficient.
  • String interning can potentially trigger a garbage collection cycle because it’s growing the intern table. Apart from the overhead of the collection itself, just the possibility of a garbage collection means that the implementation of property lookup needs to deal with the fact that all the memory may be shuffled around during the property access (e.g. all the temporaries need to be properly GC-reachable), which itself adds overhead.
  • A property lookup instruction in bytecode involves multiple steps consuming at least 5 bytes of bytecode:
    1. Loading the object from which to get the property (a 1-byte instruction, at least)
    2. Loading the string that represents the property (typically a 3-byte instruction because it embeds a reference to the string)
    3. An instruction to actually trigger the property loading (a 1-byte instruction)

Closures, on the other hand, are asymptotically more efficient. A closure that closes over 2 variables is 14 bytes — the same size as an object with 2 properties. But:

  • Each additional variable closed over is another 2 bytes, rather than the 4 bytes of a property.
  • Closure variable access is typically O(1) because closures are random-access structures and the indexes are computed at compile time by static analysis (see my post on closure variable indexing).
  • Up to 15 different closure variables can be accessed by a single 1-byte instruction, compared to a minimum of 5 bytes for the instruction sequence for object property access.

Can objects be more efficient?

The experimental static analysis technique I designed for Microvium Boost a while back computes all the information that would be required to convert an object to a fixed-length array, in cases where it’s possible to infer the closed set of property keys that are ever used to access an object. My hope is that in the future I could implement this optimization (transforming objects and classes into fixed-length arrays) which could have some significant performance benefits:

  • An object with 2 properties would be 6 bytes instead of 14.
  • Each additional object property would take another 2 bytes.
  • Property lookup would be O(1).
  • Up to 15 different properties could be accessed using a single-byte instruction.

Conclusion

As it stands, classes are not a feature that I would want to use very often since they are so expensive. But having classes opens the door to different kinds of libraries and APIs, and in particular, is a stepping-stone towards implementing other standards-compliant features in Microvium.

Inside Microvium Closures

Inside Microvium Closures

TL;DR: Support for closures in Microvium sets it apart from other JS engines of a similar size. Closures simplify state machines and enable functional-style code. Closures in snapshots are a new way of sharing compile-time state with the runtime program. This post goes through some examples and design details.

What is a closure?

MDN has already done a great job of explaining what a closure is, so I’m just going to borrow their explanation:

closure is the combination of a function bundled together (enclosed) with references to its surrounding state (the lexical environment). In other words, a closure gives you access to an outer function’s scope from an inner function. In JavaScript, closures are created every time a function is created, at function creation time.

Here is a simple example:

function makeCounter() {
  let x = 0;
  function incCounter() {
    x++;
    return x;
  }
  return incCounter;
}

const myCounter1 = makeCounter();
const myCounter2 = makeCounter();

console.log(myCounter1()); // 1
console.log(myCounter1()); // 2
console.log(myCounter1()); // 3

// myCounter2 is an independent counter
console.log(myCounter2()); // 1
console.log(myCounter2()); // 2
console.log(myCounter2()); // 3

In the above example, the function named incCounter is a closure because it closes over variable x in its outer lexical scope. A closure is just a function, nested in another function, which accesses variables in the outer function.

Microvium also supports the arrow function syntax, so the following example has the same output but is more concise:

const makeCounter = x => () => ++x;
const myCounter = makeCounter(0);
console.log(myCounter()); // 1
console.log(myCounter()); // 2
console.log(myCounter()); // 3

For more detail, take a look at the above-mentioned MDN article. The rest of this post will assume that you know what a closure is.

Why are closures useful?

Closures in snapshots

Let’s say that we want a script that exports two functions: one that prints “hello” and the other that prints “world”. Without closures, we could implement this in Microvium as follows:

vmExport(0, printHello);
vmExport(1, printWorld);

function printHello() {
  console.log('hello');
}

function printWorld() {
  console.log('world');
}

(Recall that vmExport is a function that the script calls to export a function to the host).

In this example, printHello and printWorld are each functions that take no arguments and will print the corresponding string to the console1.

With the introduction of closures, we could factor out the commonality between printHello and printWorld and just have a printX that can print either one:

const printHello = makePrinter('hello');
const printWorld = makePrinter('world');
vmExport(0, printHello);
vmExport(1, printWorld);

function makePrinter(thingToPrint) {
  return printX;
  function printX() {
    console.log(thingToPrint);
  }
}

This refactors the code so that console.log only appears once but is shared by both printHello and printWorld. For the simple case of console.log this doesn’t add much benefit, but you can imagine cases where printX is a lot more complicated and so this refactoring may be beneficial.

Another thing to note in this example is that makePrinter is called at compile time since it’s in the top-level code. The resulting closures printHello and printWorld instantiated at compile time are carried to runtime via the snapshot, along with their state (the value of thingToPrint). The closures feature here plays nicely with snapshotting as a new way to share compile-time state to runtime.

Depending on the style of code you’re familiar with, we can also write the same example more concisely as:

const makePrinter = s => () => console.log(x);
vmExport(0, makePrinter('hello'));
vmExport(1, makePrinter('world'));

If you’re not comfortable with the idea of functions returning other functions, here’s another variant of the example that does the same thing:

function exportPrinter(id, textToPrint) {
  vmExport(id, () => console.log(textToPrint));  
}
exportPrinter(0, 'hello');
exportPrinter(1, 'world');

We could also get the list of things to export from a dynamic source, such as an array (or even something read from a file at compile time using fs.readFileSync):

const printers = [
  { id: 0, textToPrint: 'hello' },
  { id: 1, textToPrint: 'world' },
];
// OR:
// printers = JSON.parse(fs.readFileSync('printers.json', 'utf8'));

for (let i = 0; i < printers.length; i++) {
  const id = printers[i].id;
  const textToPrint = printers[i].textToPrint; 
  vmExport(id, () => console.log(textToPrint));  
}

Side note: the above example also demonstrates the usefulness of vmExport being a normal function, rather than some special syntax. Think about your favorite language or engine and how you would implement the above in that language. You can’t define an extern void foo() {} inside a for-loop in C, or public static void foo() {} inside a for-loop in C#. The only solution in these environments to the objective of exporting a programmatically-defined set of functions would be to use a code generator and the result would be much more complicated.

Closures for state machines

It’s much easier and better performance to implement a finite state machine using closures. Consider the following two-state state machine:

The above state machine has 2 states, which I’ve called stateA and stateB. When event 1 is received while in stateA, the machine will transition to stateB, but if any other event (e.g. event 2) is received while in stateA, the machine will not transition to stateB. See Wikipedia for a more detailed description of FSMs.

The ability to use closures allows us to implement this state machine using a function for each state. In the following example, stateA is a function that receives events by its event parameter. We “make” stateA only when we need it, by calling enterStateA(). The example includes an eventCount as part of state A to show how states can have their own variables that are persisted across multiple events.

function enterStateA() {
  console.log('Transitioned to State A!');
  let eventCount = 0; // Some internal state only used by stateA

  // A function that handles events while we're in stateA
  function stateA(event) {
    if (event === 1) {
      currentState = enterStateB();
    } else {
      eventCount++;
      console.log(`Received ${eventCount} events while in state A`);
    }
  }

  return stateA;
}

function enterStateB() {
  console.log('Transitioned to State B!');
  return event => {
    if (event === 2) {
      currentState = enterStateA();
    }
  }
}

// We'll start in stateA
let currentState = enterStateA();

// Every time an event is received, we send it to the current state
const processEvent = event => currentState(event);

// Allow the host firmware to events to the state machine
vmExport(0, processEvent);

// Some example events:
processEvent(5); // Received 1 events while in state A
processEvent(5); // Received 2 events while in state A
processEvent(5); // Received 3 events while in state A
processEvent(1); // Transitioned to State B!
processEvent(1); //
processEvent(2); // Transitioned to State A!
processEvent(2); // Received 1 events while in state A

In the above example, state A has the eventCount counter which is part of its closure. When the system transitions to state B, the counter can be garbage collected. This might not be very useful when only considering a single counter variable, but the pattern generalizes nicely to systems that have more expensive states that may hold buffers and other resources.

Once you understand closures and higher-order functions, this is a very natural way to represent state machines.

Closures under the hood

Let’s go back to the simple “counter” example for illustration:

function makeCounter() {
  let x = 0;
  function incCounter() {
    return ++x;
  }
  return incCounter;
}

const myCounter = makeCounter();
console.log(myCounter()); // 1
console.log(myCounter()); // 2
console.log(myCounter()); // 3

The runtime memory layout for this example is as follows:

The global variable myCounter is a 2-byte slot, as are all variables in Microvium. The slot contains a pointer to the closure, which is an immutable tuple containing a reference to the function code (incCounter, in this example) and the enclosing lexical environment which in Microvium is called the scope.

The closure and scope are allocated on the garbage-collected heap — if and when the closure is no longer reachable, it will be freed.

When the closure myCounter is called, the VM sees that the callee is a closure and sets a special scope register in the VM to the closure’s scope before running the target bytecode. The bytecode can then interact with the scope variables through special machine instructions that leverage the scope register.

For a look into how variables are accessed by closure code, see my post on Closure Variable Indexing.

More efficient than objects

Closures are much more efficient than objects in Microvium:

  • Every closure variable is one word (2 bytes) while an object property is 2 words (4 bytes) because a property is stored as a key-value pair. Variables aren’t stored with their name because the static analysis can determine an index for them.
  • Closure variables can be accessed with single-byte instructions (a 4-bit opcode and 4-bit literal variable index) whereas typical object properties take at least 5 bytes of bytecode instructions to access. This is in part because the “current” closure scope is implicit, while there is no such thing as the “current” object (the object needs to be specified explicitly), and also because object property access requires specifying the property key which is a reference to a string.
  • All the object key strings take up memory in the bytecode image.
  • Most closure variables are accessed in O(1) time — adding more variables does not slow down the access time, but adding more properties to an object slows it down by O(n).

Conclusion

Closures are useful and form the backbone of a lot of JavaScript programming. I’ve talked before about how closures are useful for callback-style asynchronous code, and in this post, I’ve also shown how they are useful for modeling state machines and make certain kinds of refactoring possible.

Other tiny JavaScript engines such as Elk and mJS don’t support closures, so this feature sets Microvium apart from the crowd2.

The closures feature has been the single most complicated and time-consuming feature to implement in Microvium because the static analysis requires multiple passes and completely changed how the compiler front-end works. It’s really not as simple as one would first imagine. Consider, as just one example, that the let i binding in for (let i = 0 ...) has multiple instances of i relative to the containing function, and that on each iteration the previous value is copied to the new instance.

But after a long journey, I can say that I’m happy with the result.


  1. This example assumes that you’ve added `console.log` to the global scope 

  2. Larger engines such as Espruino and XS support closures along with many other features, but come at a serious size premium. 

Microvium has try-catch!

Microvium has try-catch!

TL;DR: Microvium now supports exception handling, with just 4 bytes of overhead per active try block. This post covers some of the details of the design, for those who are interested.


Note: if you’re new to try-catch, see the MDN article on the subject. Note however that Microvium does not yet support finally, but in my experience finally is a less-used feature of JS.

Why try-catch?

One of my objectives with Microvium is to allow users to write JS code that is stylistically similar to the kind of code they would write for a larger engine like V8/Node.js. I want it to be easy to write JS libraries that run on both Microvium and Node, and the public interface of these libraries especially should not be contorted to fit the Microvium feature subset.

A major thing that’s been missing from this vision is the ability to use exceptions to pass errors. A library that can’t use exceptions to pass errors must use some other technique such as error codes, which are not typical JavaScript style and would clutter up the API of such a library.

A look at alternatives

Microvium is implemented in C. How does one implement exceptions in a language like C? In the journey of finding the best way to do exceptions, I thought of a number of different ideas.

The most obvious one to come to mind is something like setjmp/longjmp. This is a common way to implement exception-like behavior in a language like C, since it’s the only way to jump from one C function into another that’s earlier on the stack. At face value, this sounds sensible since both the engine and host are running on C.

Looking at other embedded JavaScript engines, mJS and Elk don’t support exceptions at all, but Moddable’s XS uses this setjmp/longjmp approach. At each try, XS mallocs a jump structure which is populated with the state of current virtual registers and physical registers, as well as a pointer to the catch code that would handle the exception. Upon encountering a throw, all the virtual and physical registers are restored to their saved state, which implicitly unwinds the stack, and then control is moved to the referenced catch block.

In my measurements1, one of these jump structures in XS is 120-140 bytes on an embedded ARM architecture and 256-312 bytes on my x64 machine2. Even the jmp_buf on its own is 92 bytes on Cortex M0 (the architecture to which I’ve targetted all my size tests in Microvium). That’s quite heavy! Not to mention the processing overhead of doing a malloc and populating this structure (each time a try block is entered).

How it works in Microvium

After thinking about it for some time, the following is the solution I settled on. Consider the following code, where we have 2 variables on the stack, and 2 try-catch blocks:

If we were to throw an exception on line 5, what needs to happen?

  1. We need to pop the variable y off the stack (but not x), since we’re exiting its containing scope. This is called unwinding the stack.
  2. We need to pass control to the e1 catch clause (line 7)
  3. We need to now record that the catch(e2) on line 11 is the next outer catch block, in case there is another throw

Microvium does this by keeping a linked list of try blocks on the stack. Each try block in the linked list is 2 words (4 bytes): a pointer to the bytecode address of the corresponding catch bytecode, plus a pointer to the next outer try block in the linked list. For the previous example, there will be 2 try blocks on the stack when we’re at line 5, as shown in the following diagram:

Each of the smaller rectangles here represents a 16-bit word (aka slot). The arrows here represent pointers. The red blocks (with an extra border) each represent an active try block consisting of 2 words (catch and nextTry). They form a linked list because each has a pointer to the next outer try block.

The topTry register points to the inner-most active try block — the head of the linked list. Each time the program enters a try statement, Microvium will push another try block to the stack and record its stack address in the topTry register.

The topTry register serves 2 purposes simultaneously. For one, it points to the head of the linked list, which allows us to find the associated catch block when an exception is thrown. But it also points to exactly the stack level we need to unwind to when an exception is thrown. This is similarly true for each nextTry pointer.

When an exception is thrown, Microvium will:

  1. Take the current topTry and unwind the stack to the level it points to.
  2. Transfer control (the program counter) to the bytecode address of the catch block which is now sitting at the top of the stack (i.e. pop the program counter off the stack).
  3. Save the nextTry pointer as the new topTry (i.e. it pops the topTry register off the stack).

Et voila! We’ve implemented try-catch in Microvium with just 4 bytes of overhead per try.

This also works when throwing from one function to a catch in a caller function. The only difference is that the unwind process in step 1 then needs to restore the registers of the caller (e.g. argument count, closure state, return address, etc). This information is already on the stack since it’s also used by the return instruction — we don’t need to save it separately during a try. This works for any level of nested calls if we just unwind repeatedly until reaching the frame containing the catch target.

Conclusion

I’m pretty satisfied with this design.

  • Only one new virtual register (topTry)
  • Only 4 bytes of RAM per active try3
  • No heap overhead
  • Conceptually simple
  • Only makes the engine 156 bytes larger in ROM

The static analysis to implement this is quite hairy — it’s complicated by the interaction between trycatch and return, break, and closures, among other things. For those interested, see the test cases for examples of the kinds of scenarios that try-catch needs to deal with.

On a meta-level, this is a good reminder of how the first idea that comes to mind is often not the best one, and the benefit of thinking through a problem before jumping right in. I’ve been brainstorming ideas about how exceptions might work in Microvium since July 2020 — almost 2 years ago — and I kept going back to the ideas to revise and reconsider, going back and forth between different options. Sometimes, mulling over an idea over a long period of time can help you get to a much better solution in the end.

Another lesson is that simplicity is deceptively time-consuming. Looking only at this apparently-simple final design, you might not guess at the amount of work required to get there. This is a general tragedy of software engineering: it often takes more time to get to a simpler solution, which gives the appearance of achieving less while taking longer to do so.


  1. Please, anyone correct me if I’ve made a mistake here or misrepresented XS’s design in any way. 

  2. The ranges in sizes here depend on whether I use the builtin jmp_buf type or the one supplied by XS 

  3. And 6 bytes of bytecode ROM per try block in the code. 

Snapshotting is like compiling but better

Snapshotting is like compiling but better

TL;DR: The final output of a traditional compiler like GCC bears a family resemblance to a Microvium snapshot, but the snapshotting paradigm is both easier to use and more powerful because it allows real application code to run at build time and its state to persist until runtime.

What is snapshotting?

My Microvium JavaScript engine is built on the paradigm of creating a VM snapshot as the deployable build artifact rather than creating a traditional compiled binary. As a developer, when you run the Microvium engine on your desktop with a command like microvium main.js it will execute the script until all the top-level code is complete and then output a snapshot file containing the final VM state. The snapshot file can then be “resumed” on an embedded device using Microvium’s embedded C library (for more details, see Getting Started). Although Microvium is designed especially for microcontrollers, the principle of snapshotting goes beyond the embedded space.

Comparing to GCC

For this post, I’ll mostly compare Microvium to GCC:

gcc main.c         # Compile a C program with GCC
microvium main.js  # "Compile" a Microvium program

These two commands are analagous. Both produce a single file as the result, and this file is what you want to deploy to the target environment. In the case of GCC, the output is of course the executable (e.g. a.out), while in the case of Microvium, it’s the snapshot (main.mvm-bc).

Both of these commands do some kind of compilation as part of the process. GCC translates your function code to machine instructions, while Microvium translates to virtual machine instructions (bytecode instructions).

Constants

Both the GCC output and the Microvium output have a section for constants, including function code. You may be familiar with this as the .text section. Among other things, this contains constant values, such as:

// JavaScript
const x = 42;
// C
const int x = 42;

… but better

You can do this in Microvium but not in C:

const x = foo();

function foo() {
  return 42;
}

In C, it’s a compile-time error to call runtime functions for the calculation of constants. But in Microvium, this is perfectly legal since there is no distinction between compile-time and runtime — there is only really runtime before the snapshot and runtime after the snapshot. Although informally we may refer to the former as “compile-time” and the latter as “runtime”.

Apart from just being cleaner and easier to use, the Microvium snapshotting paradigm here allows computationally-intensive constants to be calculated at build time, using arbitrary functions and libraries that might also be useful at runtime.

Variable initializers… but better

For non-constant variables, both GCC and Microvium have an output section for the initial1 value of all the variables, which is copied into RAM at runtime. You may know this traditionally as the .data section.

But a major difference between them is that a Microvium snapshot also contains heap data.

// JavaScript
let arr = [1, 2, 3];

// C
int* arr = malloc(3 * sizeof(int));  // ! Can't do this (in top-level code)

The best you could do in C for the above example would be to have an init function that runs early in the program to set up the initial runtime state of the program. Snapshotting is better because this initial structure can be established at build time.

Modules … but better

Microvium and C both have support for structuring your code in multiple files which get bundled into the same output artifact:

import { foo } from './foo.js'
#include "foo.h"

With the #include here, the dependent module implementation (e.g. foo.c) is not automatically compiled and linked into the program by GCC — you need to separately list foo.c to be compiled by GCC, or orchestrate the dependencies using a makefile.

But in the case of a Microvium import, the import statement itself is executed at build time, performing the module resolution, loading, parsing, and linking at build time, as well as executing the top-level code of the imported module. The top-level code of the imported module may in turn import other modules, transitively importing the whole module graph and executing its top-level initialization code.

Preprocessor… but better

Both C and Microvium support “compile-time” logic:

// C
#if USE_FOO_1
#define foo foo1
#else
#define foo foo2
#endif
// JavaScript
const foo = USE_FOO_1 ? foo1 : foo2;

Of course, you already knew that, because all the examples so far have demonstrated the fact that snapshotting allows you to run JavaScript at compile time. But I want to emphasize some of the key reasons why the snapshotting paradigm is better for this:

  • You don’t need two different programming languages (e.g. the preprocessor language and the C language).
  • Your “compile-time” code has the full power of the main language.
  • Your runtime code carries over the state from your compile-time code.

So in some sense, this unifies the preprocessor language with the main language. This applies similarly to other “compile-time languages”, such as makefiles and linker scripts. Or if you’re coming from the JS world, consider how the snapshotting paradigm obviates the need for a webpack.config (see Snapshotting vs Bundling).

But what about using the preprocessor to conditionally include different runtime logic, such as for different devices? For example, consider the following C code:

int myFunction() {
#if SOME_CONDITION
  doX();
#else
  doY();
#endif 
}

Of course, it’s easy to see how this example might translate to JS:

function myFunction(someCondition) {
  if (someCondition) {
    doX();
  } else {
    doY();
  }
}

Now we have the bonus that our unit tests can inject someCondition to test both cases.

But doesn’t this code mean that now we have both doX and doY branches at runtime, taking up ROM space? (and someCondition)

That’s why I’ve also been developing the experimental Microvium Boost: an optimizer that analyzes a snapshot and removes unused branches of code. For example, if the analysis shows that someCondition is always true in your program, it can remove it as a parameter from myFunction and also remove the call to doY as dead code. This is still experimental but has shown significant success so far.

Host exports… but better

So far we’ve been considering an executable output from GCC, but it would be more accurate to compare a Microvium snapshot with a compiled shared library (e.g. DLL). Like a shared library, Microvium snapshots do not have a single entry point but may contain many exported functions to be resolved at runtime on the device.

A Windows DLL suits the analogy better than a Linux shared library, so in this section the examples will use msbuild rather than GCC.

Both a Microvium snapshot and a DLL binary contain a section for dynamic linking information — a table that associates relevant functions in the DLL with a number2 so that the host program using the DLL/snapshot at runtime can find them.

In the case of a DLL, you can provide the compiler with a DEF file that tells the compiler what to put into the DLL export table. If you wanted to export the functions foo, bar, and baz from the DLL with IDs 1, 2, and 3 respectively, your DEF file might look like this:

LIBRARY   MyLibrary
EXPORTS
   foo   @1
   bar   @2
   baz   @3
// C
int foo() { return 42; }
int bar() { return 43; }
int baz() { return 44; }

The equivalent in Microvium would be as follows:

// JavaScript
function foo() { return 42; }
function bar() { return 43; }
function baz() { return 44; }

vmExport(1, foo);
vmExport(2, bar);
vmExport(3, baz);

You may have noticed a recurring theme in this post: the Microvium snapshotting paradigm doesn’t require a whole new language in order to do different build-time tasks. In this case, Microvium doesn’t require a DEF file (or a special __declspec(dllexport) language extension), since vmExport is just a normal function. This is just simpler and more natural.

Another recurring theme here is that the snapshotting approach is more powerful, allowing you to do things that are impossible or impractical in the traditional paradigm. Take a look at the following example in Microvium:

// JavaScript
for (let i = 1; i <= 3; i++) {
  vmExport(i, () => 41 + i);
}

This has the same overall effect as the previous code, adding 3 functions to the export table of the deployed binary with IDs 1, 2, and 3 and which return 42, 43, and 44 respectively. But having vmExport be a normal function means that now we have the full power of the language for orchestrating these exports, or for writing an abstraction layer over the export system, or outsourcing this logic to a third-party library.

Side note: a more subtle point in this example for advanced readers is its code cohesion. The single line of code mixes both compile-time and runtime code (vmExport(i,...) and ()=> 41 + i respectively), but keeps the related parts of both in close proximity. This is the difference between temporal cohesion (grouping code by when its run) vs functional cohesion (grouping code by what feature it relates to) (see Wikipedia). A common disadvantage of having separate build-time or deploy-time code (e.g. a DEF file, makefile, linker script, webpack.config, dockerfile, terraform file, etc) is that it pushes you into temporal cohesion, which in turn damages modularity and reusability.

Conclusion

The idea of deploying a snapshot rather than a traditional compiled binary opens up a whole new paradigm for software development. The end result is very similar — a binary image with sections for different memory spaces, compiled function code, constants, initial variable values, and export/import tables — but the snapshotting paradigm is both simpler and more powerful.


  1. In the context of Microvium, the word “initial” here refers to the initial state when the snapshot is resumed, not the initial state when the program is started, since the program starts at build time, and variables in JavaScript start with the value undefined. 

  2. DLL exports can be by name or number, but Microvium exports are only by number, for efficiency reasons, so that’s what I’m using in the analogy here. 

Single-threading is more memory-efficient

Single-threading is more memory-efficient

TL;DR: Single-threading with super-loops or job queues may make more efficient use of a microcontroller’s memory over time, and Microvium’s closures make single-threading easier with callback-style async code.

Multi-threading

In my last post, I proposed the idea that we should think of the memory on a microcontroller not just as a space but as a space-time, with each memory allocation occupying a certain space for some duration of time. I suggested therefore that we should then measure the cost of an allocation in byte-seconds (the area of the above rectangles as bytes × seconds), so long as we assumed that allocations were each small and occurred randomly over time. Randomness like this is a natural byproduct of a multi-threaded environment, where at any moment you may coincidentally have multiple tasks doing work simultaneously and each taking up memory. In this kind of situation, tasks must be careful to use as little memory as possible because at any moment some other tasks may fire up and want to share the memory space for their own work.

The following diagram was generated by simulating a real malloc algorithm with random allocations over time (code here, and there’s a diagram with more allocations here):

A key thing to note is that the peak memory in this hypothetical chaotic environment can be quite a bit higher than the average, and that these peaks are not easily predictable and repeatable because they correspond to the coincidental execution of multiple tasks competing for the same memory space1.

This leaves a random chance that at any moment you could run out of memory if too many things happen at once. You can guard against this risk by just leaving large margins — lots of free memory — but this is not a very efficient use of the space. There is a better way: single threading.

Single threading

Firmware is sometimes structured in a so-called super-loop design, where the main function has a single while(1) loop that services all the tasks in turn (e.g. calling a function corresponding to each task in the firmware). This structure can have a significant advantage for memory efficiency. In this way of doing things, each task essentially has access to all the free memory while it has its turn, as long as it cleans up before the next task, as depicted in the following diagram. (And there may still be some statically-allocated memory and “long-lived” memory that is dynamic but used beyond the “turn” of a task).

Overall, this is a much more organized use of memory and potentially more space-efficient.

In a multi-threaded architecture, if two memory-heavy tasks require memory around the same time, neither has to wait for the other to be finished — or to put it another way, malloc looks for available space but not available time for that space. On the other hand, in a super-loop architecture, those same tasks will each get a turn at different times. Each will have much more memory available to them during their turn while having much less impact on other tasks the rest of the time. And the overall memory profile is a bit more predictable and repeatable.

An animated diagram you may have seen before on my blog demonstrates the general philosophy here. A task remains idle until its turn, at which point it takes center stage and can use all the resources it likes, as long as it packs up and cleans up before the next task.

So, what counts as expensive in this new memory model?

It’s quite clear from the earlier diagram:

  • Memory that is only used for one turn is clearly very cheap. Tasks won’t be interrupted during their turn, so they have full access to all the free memory without impacting the rest of the system.
  • Statically-allocated memory is clearly the most expensive: it takes away from the available memory for all other tasks across all turns.
  • Long-lived dynamic allocations — or just allocations that live beyond a single turn — are back to the stochastic model we had with multi-threading. Their cost is the amount of space × the number of turns they occupy the space for. Because these are a bit unpredictable, they also have an additional cost because they add to the overall risk of randomly running out of memory, so these kinds of allocations should be kept as small and short as possible.

Microvium is designed this way

Microvium is built from the ground up on this philosophy — keeping the idle memory usage as small as possible so that other operations get a turn to use that memory afterward, but not worrying as much about short spikes in memory that last only a single turn.

  • The idle memory of a Microvium virtual machine is as low as 34 bytes2.
  • Microvium uses a compacting garbage collector — one that consolidates and defragments all the living allocations into a single contiguous block — and releases any unused space back to the host firmware. The GC itself uses quite a bit of memory3 but it does so only for a very short time and only synchronously.
  • The virtual call-stack and registers are deallocated when control returns from the VM back to the host firmware.
  • Arrays grow their capacity geometrically (they double in size each time) but a GC cycle truncates unused space in arrays when it compacts.

See here for some more details.

Better than super-loop: a Job Queue

The trouble with a super-loop architecture is that it services every single task in each cycle. It’s inefficient and doesn’t scale well as the number of tasks grows4. There’s a better approach — one that JavaScript programmers will be well familiar with: the job queue.

A job queue architecture in firmware is still pretty simple. Your main loop is just something like this:

while (1) {
  if (thereIsAJobInTheQueue) 
    doNextJob();
  else
    goToSleep();
}

When I write bare-metal firmware, often the first thing I do is to bring in a simple job queue like this. If you’re using an RTOS, you might implement it using RTOS queues, but I’ve personally found that the job-queue style of architecture often obviates the need for an RTOS at all.

As JavaScript programmers may also be familiar with, working in a cooperative single-threaded environment has other benefits. You don’t need to think about locking, mutexes, race conditions, and deadlocks. There is less unpredictable behavior and fewer heisenbugs. In a microcontroller environment especially, a single-threaded design also means you also save on the cost of having multiple dedicated call stacks being permanently allocated for different RTOS threads.

Advice for using job queues

JavaScript programmers have been working with a single-threaded job-queue-based environment for decades and are well familiar with the need to keep the jobs short. When running JS in a browser, long jobs means that the page becomes unresponsive, and the same is true in firmware: long jobs make the firmware unresponsive — unable to respond to I/O or service accumulated buffers, etc. In a firmware scenario, you may want to keep all jobs below 1ms or 10ms, depending on what kind of responsiveness you need5.

As a rule of thumb, to keep jobs short, they should almost never block or wait for I/O. For example, if a task needs to power-on an external modem chip, it should not block while the modem to boots up. It should probably schedule another job to handle the powered-on event later, allowing other jobs to run in the meantime.

But in a single-threaded environment, how do we implement long-running tasks without blocking the main thread? Do you need to create complicated state machines? JavaScript programmers will again recognize a solution…

Callback-based async

JavaScript programmers will be quite familiar the pattern of using continuation-passing-style (CPS) to implement long-running operations in a non-blocking way. The essence of CPS is that a long-running operation should accept a callback argument to be called when the operation completes.

The recent addition of closures (nested functions) as a feature in Microvium makes this so much easier. Here is a toy example one might use for sending data to a server in a multi-step process that continues across 3 separate turns of the job queue:

function sendToServer(url, data) {
  modem.powerOn(powerOnCallback);

  function powerOnCallback() {
    modem.connectTo(url, connectedCallback);
  }

  function connectedCallback() {
    modem.send(data);
  } 
}

Here, the data parameter is in scope for the inner connectedCallback function (closure) to access, and the garbage collector will automatically free both the closure and the data when they aren’t needed anymore. A closure like this is much more memory-efficient than having to allocate a whole RTOS thread, and much less complicated than manually fiddling with state machines and memory ownership yourself.

Microvium also supports arrow functions, so you could write this same example more succinctly like this:

function sendToServer(url, data) {
  modem.powerOn( () => 
    modem.connectTo(url, () => 
      modem.send(data)));
}

Each of these 3 stages — powerOn, connectTo and send — happen in a separate job in the queue. Between each job, the VM is idle — it does not consume any stack space6 and the heap is in a compacted state7.

If you’re interested in more detail about the mechanics of how modem.powerOn etc. might be implemented in a non-blocking way, take a look at this gist where I go through this example in more detail.

Conclusion

So, we’ve seen that multi-threading can be a little hazardous when it comes to dynamic memory management because memory usage is unpredictable, and this also leads to inefficiencies because you need to leave a wider margin of error to avoid randomly running out of memory.

We’ve also seen how single-threading can help to alleviate this problem by allowing each operation to consume resources while it has control, as long it cleans up before the next operation. The super-loop architecture is a simple way to achieve this but an event-driven job-queue architecture is more modular and efficient.

And lastly, we saw that the Microvium JavaScript engine for embedded devices is well suited to this kind of design, because its idle memory usage is particularly small and because it facilitates callback-style asynchronous programming. Writing code this way avoids the hassle and complexity of writing state machines in C, of manually keeping track of memory ownership across those states, and the pitfalls and overheads of multithreading.


  1. This simulation with random allocations is not a completely fair representation of how most firmware allocates memory during typical operation, but it shows the consequence of having many memory-consuming operations that can be preempted unpredictably or outside the control of the firmware itself. 

  2. Or 22 bytes on a 16-bit platform 

  3. In the worst case, it doubles the size of heap while it’s collecting 

  4. A super-loop also makes it more challenging to know when to put the device to sleep since the main loop doesn’t necessarily know when there aren’t any tasks that need servicing right now, without some extra work. 

  5. There will still be some tasks that need to be real-time and can’t afford to wait even a few ms in a job queue to be serviced. I’ve personally found that interrupts are sufficient for handling this kind of real-time behavior, but your needs may vary. Mixing a job queue with some real-time RTOS threads may be a way to get the best of both worlds — if you need it. 

  6. Closures are stored on the virtual heap. 

  7. It’s in a compacted state if you run a GC collection cycle after each event, which you would do if you cared a lot about idle memory usage. 

Short-lived memory is cheaper

Short-lived memory is cheaper

TL;DR: RAM on a microcontroller should not just be thought of as space but as space-time: a task that occupies the same memory but for a longer time is more expensive.

MCU memory is expensive

If you’re reading this, I probably don’t need to tell you: RAM on a microcontroller is typically very constrained. A 3 GHz desktop computer might have 16 GB of RAM, while a 3 MHz MCU might have 16 kB of RAM — a thousand times less processing power but a million times smaller RAM. So in some sense, RAM on an MCU may be a thousand times more valuable than on a desktop machine. Regardless of the exact number, I’m sure we can agree that RAM is a very constrained resource on an MCU1. This makes it important to think about the cost of various features, especially in terms of their RAM usage.

Statically-allocated memory

It’s common especially in smaller C firmware to just pre-allocate different pieces of memory to different components of the firmware (rather than using malloc and free). For example, at the global level, we may declare a 256-byte buffer for receiving data on the serial port:

uint8_t rxBuffer[256];

If we have 1kB of RAM on a device for example, maybe there are 4 components that each own 256 B. Or more likely, some features are more memory-hogging than others, but you get the idea: in this model, each component in the code owns a piece of the RAM for all time.

Dividing up RAM over time

It’s of course a waste to have a 256-byte buffer allocated forever if it’s only used occasionally. The use of dynamic memory allocation (malloc and free) can help to resolve this by allowing components to share the same physical memory at different times, requesting it when needed, and releasing it when not needed.

This allows us to reconceptualize memory as a mosaic over time, with different pieces of the program occupying different blocks of space-time (occupying memory space for some time).

When visualizing memory like this, it’s easy to start to feel like the cost of a memory allocation is not just the amount of memory it locks, but also the time that it locks it for (e.g. the area of each rectangle in the above diagram). In this sense, if I allocate 256 bytes for 2 seconds then it costs 512 byte-seconds, which is an equivalent cost to allocating 512 bytes for 1 second or 128 bytes for 4 seconds.

Being a little bit more rigorous

Skip this section if you don’t care about the edge cases. I’m just trying to be more complete here.

This measure of memory cost is of course just one way of looking at it, and breaks down in edge cases. For example, on a 64kB device, a task2 that consumes 1B for 64k-seconds seems relatively noninvasive while a task that consumes 64kB for 1s is much more costly. So the analogy breaks down in cases where the size of the allocations are significant compared to the total memory size.

Another way the model breaks down is if many of the tasks need memory around the same time — e.g. if there is some burst of activity that requires collaboration between many different tasks. The typical implementation of malloc will just fail if there is not memory available right now, as opposed to perhaps blocking the thread until the requested memory becomes available, as if the memory was like a mutex to be acquired.

But the model is accurate if we make these assumptions:

  • The size of the individual allocations is small relative to the total memory size
  • Allocations happen randomly over time and space

Under these assumptions, the total memory usage becomes a stochastic random variable whose expected value is exactly:

The expected allocation size × the expected allocation size × the expected allocation frequency

We could also calculate the probability that the device runs out of memory at any given point (I won’t do the calculations here).

Conclusion

In situations where memory allocations can be approximated as being small and random, the duration of a memory allocation is just as important as its size. Much more care must be taken to optimize memory usage for permanent or long-lived operations.

I’m not very happy with this stochastic viewpoint and all these edge cases. It means that at any point, we could randomly exceed the amount of available memory and the program will just die. Is there a better way to organize memory space-time so we don’t need to worry as much? I believe there is… and I’ll cover that in the next post.


  1. Another thing that makes it a constrained resource is the lack of virtual memory and the ability to page memory in and out of physical RAM, so the RAM size is a hard limit 

  2. When talking about this space-time model, I think it’s easier to talk about a “task” than a “component”, where a task here is some activity that needs to be done by the program over a finite stretch of time, and will consume resources over that time. 

Incidental vs Accidental Complexity

Incidental vs Accidental Complexity

Developers often talk about accidental vs essential software complexity. But is there a difference between accidental complexity and incidental complexity? I think there is.

I like this definition of essential complexity:

Essential complexity is how hard something is to do, regardless of how experienced you are, what tools you use or what new and flashy architecture pattern you used to solve the problem.

(source)

That same source describes accidental complexity as any complexity that isn’t essential complexity, and this may indeed be the way that the terms are typically used. But I’d like to propose a distinction between accidental and incidental complexity:

Accidental complexity: non-essential complexity you introduce into the design by mistake (by accident), and you can realize your mistake in an “oops” moment (and consider correcting or learning from it).

Incidental complexity: any non-essential complexity you introduce into the design, which may be by mistake or on purpose (it may or may not be accidental and may or may not be accompanied by an “oops” moment).

Accidental complexity is when I accidentally knocked a drinking glass off the table and it shattered into pieces, making my morning more complicated as I cleaned it up. It’s marked with “oops” and the realization of causing something I didn’t mean to.

Incidental complexity is when I choose to turn off the hotplate on my stove by pressing the digital “down” button 8 times until the level reaches zero, when my wife later informed me that pressing “up” once would achieve the same thing (since the level wraps around under certain circumstances). I did not accidentally press the “down” button 8 times, I did it intentionally, but was still ignorant of my inefficiency.

Clearly, there’s a fine line between the two. Defining something as incidental complexity but not accidental is a function of my ignorance and depends on my level of awareness, which changes over time and various from person to person.

Why introduce unnecessary complexity on purpose?

1. Simplicity takes effort

Paradoxically, often the simpler solution is also one that takes longer to design and implement, so it can be a strategic design choice to go with a more complicated solution than a simpler one.

This is especially true for POCs and MVPs in unfamiliar domains or where the business requirements aren’t completely clear. In these cases, you can’t be confident of all the details upfront and you’re quite likely to be rewriting the solution a few times before it reaches its final state. Spending time simplifying and organizing each iteration can be a wasted effort if it will just be thrown away.

There may also just not be time for the simpler solution. The added effort defining a clean abstraction and structure takes time and there is an opportunity cost to taking longer on something, maybe through losing business revenue when the feature is delivered later, or through the cost of not working on the next feature. There are tradeoffs to be made.

2. Ignorance: You don’t know any better

This might be a little controversial, but I would argue that if you design something as simple as you can possibly conceive, then any remaining incidental complexity is not “accidental”. Like the example earlier of me pressing “down” 8 times instead of “up” once.

As a concrete example in software, JavaScript never used to have a async/await or the Promise type, and as a result, code was often structured with nested callbacks to handle sequences of asynchronous actions (known as “callback hell”). This kind of code is clearly unnecessarily complicated, no matter how hard people tried to contain the complexity. But designs based on callbacks are not accidentally complicated — they have exactly the design intended by the author which is believed to be the best possible design.

But yet, promises could be (and were eventually) implemented as a third-party library, and so could the async/await pattern (through the abuse of generator functions). So the potential was there from the beginning for every developer to structure their code in a simpler way if they could just realize the potential to do so. The complexity was not imposed as a restriction in the programming language, it was imposed by ourselves because we hadn’t been awakened to a better way of seeing things.

It’s worth noting one more thing: if you’ve ever tried to explain JavaScript Promises to someone, you’ll know that an explanation of what a Promise is or does is unconvincing. Understanding how to think in terms of Promises requires a mental paradigm shift, which doesn’t occur merely by explanation. Even a concrete demonstration may be unconvincing since the code may be “shorter”, but is it really “simpler” when it’s full of all these new and unfamiliar concepts? The way to internalize a new paradigm like this is to use it and practice thinking in terms of it. Only then can we really see what we were missing all along.

Why does it matter?

I think the distinction between accidental and incidental complexity matters because it should keep us humble and aware that the way we write software today is probably ridiculously inefficient compared to the techniques, patterns, and paradigms of the future.

It shows that certain kinds of incidental complexity can look a lot like essential complexity if we don’t yet have the mental tools to see those inefficiencies. The biggest productivity boosts we could gain in our designs might be those that aren’t obvious from the perspective of our current ways of thinking, and which might take a lot of time and effort to see clearly.

It shows that finding incidental complexity is not just a case of looking at a project and identifying what we think is needless, since there’s a good chance that we’re blind to a lot of it and all we will just see are things that make us say “yes, that thing is important for reason xyz — we can’t remove that”. Like all the callbacks in callback hell: every callback seems important, so it seems like it is all essential complexity, even though there exists a completely different way of structuring the code that would make it easier to understand and maintain.

But on the positive side, it shows that we are capable of identifying a lot of incidental complexity, if we spend enough time and effort thinking through things carefully, and if we remain open and curious about better ways of doing things.

Quality vs Quantity

Quality vs Quantity

It’s an age-old debate, but as I get older, my perspective is changing.

I feel increasingly frustrated by tools that don’t work smoothly; tools that weren’t thought out very well. And I ask myself, wouldn’t my life be better without these?

TL;DR The economic incentive for product development is not always aligned with what makes the world a better place. Tools that make people more productive do not necessarily make them happier. As creators, we should first-and-foremost create things that are pleasant to use, even when it’s not economically optimal for us or the companies we work for — economics should serve humankind, not be our master.


I’d be remiss creating such a post without whining about some concrete examples:

My Washing Machine

My washing machine1 has a “quick wash” setting that can do a load of washing in about 30min — great for those in a rush. And it allows you to further customize the quick wash with things like temperature and spin speed.

But for some unknown reason, it doesn’t allow you to select the maximum spin speed while in the “quick wash” setting!

If you want your clothes well-spun, you need to select another setting, the quickest of which is “whites”, and takes three-and-half-hours!

There’s a work-around, which I use most times — you run the machine on a quick-wash cycle (30 min), and then afterward change the setting to “whites” and then select the sub-setting “spin only”.

There are many other confusing and frustrating choices made for the interface and control of this machine, but it would take a whole post to go into them all!

How many extra man-hours would it have taken that company to properly think through the HMI2 design of the washing machine? How much extra would that have added to the price of each machine?

And here’s a deeper question: does the existence of this washing machine design make the world a better or worse place? If we could magically remove all washing machines with this design from history, would we be better or worse off?

What does it even mean to be better or worse off? This depends heavily on your world view. If you are a hedonist, then happiness/pleasure is the ultimate goal, and you can mentally compare the global happiness with or without this washing machine design in it, and make a value judgement about whether the washing machine is a net win or loss.

The difference between the two hypothetical worlds is subtle and complicated. The annoyance I feel when I do the washing is brief and small, but it’s multiplied by the thousands of people who have a similarly-designed machine.

And if we magically removed the design from the world timeline, we’d also be removing the satisfaction I felt when I discovered the work-around, and the catharsis of complaining about it in this blog post, or the social bonding it provides between people with the same issue who can enthusiastically affirm each other’s miserable experiences of such a damned machine.

But even with all these “benefits” of bad design, I honestly believe that we’d all be better off, in at least a small way, if this design didn’t exist.

So, what does this mean? Should humankind band together to create the perfect washing machine — a design which can be used for hundreds of years into the future, making people all of the world happy or even excited to do their washing?

In principle, I’d say “yes”. If we as humankind had collectively spent our efforts creating just one “great” washing machine design instead of a mix of good and bad designs across many companies, we’d probably all be better off overall.

The reality, of course, is more complicated. Different people’s ideas of the “perfect washing machine” are different. Technology makes this a moving target — the best washing machine 1000 years from now will be much better than the best we can do now, but we don’t expect a washing machine company to internally do a thousand years of technological research before creating their first product!

Adobe After Effects

Adobe After Effects is software for creating and editing video and animation. I learnt how to use After Effects this year so I could create little animated diagrams like that in my snapshot-vs-bundler post.

I find After Effects to be a really interesting example when it comes to armchair-philosophy about product quality. After Effects is, in my opinion, incredibly poorly thought out and frustrating to use. But it is also the only software that can do everything that it does, and using it makes you 100x more productive than if you were to try to do the same animations/effects manually.

I won’t go into a lot of specifics of what I think is bad design in After Effects, but I can’t help but mention one example: you can’t copy-paste something from one project to another (in fact, you can’t even open two projects at the same time). If you designed something great in one project and want to use it in the next (reuse, DRY), it’s a complicated and time-consuming process to export parts of one project into a temporary project and then import the temporary project as a nested child of the second project, from which you can now copy and paste.

You’re welcome to disagree with me and think that it’s a great design. The thought-experiment remains valid: in cases where a tool achieves something that was previously unachievable, or boosts their user’s productivity by some factor, is it better for the tool to have more capability or to be more pleasant to use?

The answer is not completely obvious, and it really got me thinking. My mind is in the weird paradox of simultaneously thinking that my life is worse off with After Effects while also I choose to use it because nothing else is available to do the job.

I wonder how many other people might also be in a similar position, perhaps frustrated by tools they use, but pushed to use them because the tools make their competitors more productive. It’s a prisoner’s dilemma situation (wiki).

50 years ago, nobody was complaining about the lack of After Effects software — they were fine with its absence — but today, we complain about its flaws. We’re “forced” to use it, either because our competitors in the space are using it or, in my case, because the mere knowledge of its existence makes work without it feel much slower, which itself is frustrating.

I honestly think the world would be a better place if After Effects didn’t exist and if knowledge of its absence also didn’t exist, nor a craving for something like it to exist. Or even better, the world would be a better place if After Effects was created with more design thought, making common workflows easy and pleasant, even at the expense of its variety of features.

Lessons

As a software engineer, I’m one of the lucky few who are actually contributing to the development and design of these kinds of tools. So what kind of lessons can I learn from some of my frustrations with other tools, like the washing machine or After Effects?

For me personally, the lesson is that quality is far more important than quantity. I think that humankind will be better off with fewer tools and less productivity, in exchange for tools that are more pleasant to work with. It’s better if technological development is slower, maybe 20 years behind where it would otherwise be, but the technologies developed are robust, dependable, and a pleasure to use.

In some ways, this goes against modern business practice. In our capitalistic world, there’s a temporary advantage gained for using a tool that makes the business more productive, even if it makes its employees less happy — the benefit is negated when the competitors do the same thing, removing the competitive advantage of the productivity boost and driving down overall prices of the product (but leaving employees still unhappy with the status quo).

Likewise, on the other side, it’s advantageous for a software company to sell tools that deliver the maximum economic value to their users, not necessarily maximum pleasure and peace. Remember that automating the repetitive parts of a user’s job does not necessarily make their life better, it just changes the nature of their work (now their job is about running the automation software). If a new piece of software allows users to complete their work in half the time, they don’t take the other half of their time to sit in the park and watch the birds play in the fountain!

If you create a tool that delivers more productivity in a way that’s more frustrating to use than the less-productive way, the machine of capitalism forces people to use your more-productive tool (which may make you rich in the process!) at the expense of increasing net frustration in the world.

Microvium

(Non-technical audiences may want to skip this section)

Microvium has parallels with After Effects, in that it’s a new tool that achieves something that was previously unachievable (it brings JavaScript to devices with only 16kB of flash). It boosts productivity significantly over alternatives such as EmbedVM.

So would it be ok if Microvium brought these benefits but was frustrating to use?

I think not. I’ve spent considerable effort in making Microvium as easy and pleasant to use as possible. For example:

  • I take great care in the full experience of integrating and using Microvium.
    • The one-liner install for node hosts, or the two-file copy-and-paste for C hosts (one h file for the interface, and one c file for the library).
    • The getting-started guide demonstrates this to the extreme, leading the user incrementally through all the concepts, building up to a full native host example.
    • The getting-started guide is even part of the regression tests — the automated test infrastructure literally extracts the example code from the getting-started markdown file and make sure it all still works. There’s little more frustrating than an outdated or incorrect introduction to a new tool.
  • I spent a great deal of effort making the FFI3 design super simple.
    • I hope that the way I designed it seems “obvious” and “intuitive” to people looking at it in hindsight — this is the way that designs should be. When you read through the getting-started guide, you probably don’t think at all about how good or bad the FFI system is. It’s hopefully invisible to the user, in the same way that a good washing machine should be invisible. It should just do its job, in the least-frustrating way possible.
    • Compare the FFI design of Microvium with that of Node.js for example, where one requires complicated binding files, additional configuration files (e.g. binding.gyp), the need to understand N-API vs NAN vs the V8 API, etc. In Microvium, you can export a JavaScript function to C code using a single line of JavaScript code and no configuration file is needed:
      vmExport(1234, sayHello);
  • Even for something like the module API for Microvium (for node.js hosts), which many people won’t encounter directly, I put a great deal of thought into making the model as conceptually simple as possible.
    • Compare to something like the Compartment API for JavaScript (under development), where the explanation of its module API necessitates the definition of 6 different kinds of “module specifier”, and the distinction between “resolving”, “loading”, “binding”, and “importing”. While Microvium does all of these, it doesn’t require the user to understand these concepts in order to use it.

This is to say that I’m practicing what I preach. However much I may fall short of the ideal, I put a lot of time and energy into making Microvium a pleasure to use and simple to understand, at the expense of not delivering as many features and it taking longer to develop.

I believe this was the right choice, and I’ll definitely apply it to other projects moving forward.

What are you optimizing for?

I’ll conclude by asking the question: what are you optimizing for?

If you are a manager or leader at a company, you may be inclined optimize the company to produce profits. You may even have a fiduciary duty to do so on behalf of your investors.

Sometimes the profit incentive is aligned with making good products: after all, happy customers are returning customers.

But clearly that’s not always the case — it’s common practice to release buggy software early, so you can iterate on it and use your users as a testbed to weed out the bugs. It’s common practice to “fail fast” in small startups — getting new ideas out the door quickly so you can establish their efficacy in the market, rather than spending significant time designing something that might be pleasant to use but nobody wants it because the idea was flawed. It’s common practice to aim for a first-mover advantage, especially in markets with a network effect, to trap people in your ecosystem before competitors (who may have a more pleasant product but were slower to get there) have a chance to penetrate the market.

Should fiduciary duty or duty as an employee transcend the betterment of mankind?

What goals are you optimizing your products for? What are you optimizing your company for? What are you optimizing your life for?


  1. Actually, my housemate’s washing machine 

  2. Human-machine interface 

  3. Foreign Function Interface — the way in which script code communicates host code and vice versa 

Be a multiplier

Be a multiplier

You may have heard the analogy that some software engineers add productivity, while some multiply productivity. Today I’d like to dig a little deeper into this and share my own thoughts.

What does it mean?

For those who haven’t heard the phrase before, let me try to unpack my understanding of it. Consider a tale of two programmers – let’s call them Alice and Bob, to pick two arbitrary names. Alice’s boss gives her a task to do: she is told to add a new thingamajig to the whatchamacallit code. She’s diligent, hardworking, and knows the programming language inside out. She’s had many years of experience, and especially knows how to add thingamajigs to whatchamacallits, because she’s done it many times before at this job and her last job. In fact, she was hired in part because of her extensive experience and deep knowledge with whatchamacallits, and at this company alone must have added over a hundred thingamajigs.

Because of her great skill and experience, she gives her boss an accurate time estimate: it’s going to take her one week. She knows this because she did almost exactly the same thing two weeks ago (as many times before), so she’s fully aware of the amount of work involved. She knows all the files to change, all the classes to reopen, and all the gotcha’s to watch out for.

One week later, she’s finished the task exactly on schedule. Satisfied with the new thingamajig, her boss is happy with the value she’s added to the system. Her boss is so grateful for hiring her, because she’s reliable, hard working, and an expert at what she’s doing.

Unfortunately for the company, Alice’s great experience gets her head-hunted by another company, where she’s offered a significantly higher salary and accepts immediately. The company mourns the loss of one of their greatest, who soon gets replaced by the new guy – Bob.

Bob is clearly wrong for the job by all standards, but some quirk of the job market and powers-that-be landed him up taking Alice’s place. He has no prior experience with whatchamacallits, let alone thingamajigs. And he doesn’t really know the programming language either (but he said he knows some-weird-list-processing-language-or-something-I-don’t-remember-what-he-said-exactly, and said that he’d catch on quickly). His new boss is very concerned and resents hiring him, but the choice was out of his hands.

On his first week, his boss asks him to add a thingamajig to the whatchamacallit code, as Alice had done many times. He asks Bob how long it will take, but Bob can’t give a solid answer – because he’s never done it before. It takes bob an abysmal 2 weeks just to figure out what thingamajigs are exactly, and why the business needs them. He keeps asking questions that seem completely unnecessary, digging into details that are completely irrelevant to the task. Then he goes to his boss and says it will take him 3 weeks to do it properly. “3 Weeks! OMG, what I have I done? Why did we hire this idiot”.

There’s not much to be done except swallow the bitter pill. “OK. 3 weeks”. It’s far too long. The customers are impatient. But, “oh well, what can you do?”

3 weeks later Bob is not finished. Why? Well again, he’s never done this before. He’s stressed. He’s missing deadlines in his first months on the job, and everyone’s frustrated with him. When all is said and done, and all the bugs are fixed, it takes him 2 months to get this done.

By now there is a backlog of 5 more thingamajigs to add. His boss is ready to fire him, but he optimistically dismisses the 2 months as a “learning curve”, and gives Bob another chance. “Please add these 5 thingamajigs. How long will it take you?”

Bob can’t give a solid answer. He swears it will be quicker, but can’t say how long.

The next day Bob is finished adding the 5 more thingamajigs. It took him 30 minutes to add each one, plus a few hours debugging some unexpected framework issues. What happened? What changed?

What happened is that the first 10 weeks that Bob was spending at his new job, he immediately noticed a big problem. There were 150 thingamajigs in the whatchamacallit codebase, and they all had a very similar pattern. They all changed a common set of files, with common information across each file. The whole process was not only repetitive, but prone to human error because of the amount of manual work required. Bob did the same thing he’s always done: he abstracted out the repetition, producing a new library that allows you just to define the minimal essence of each thingamajig, rather than having to know or remember all the parts that need to be changed manually.

To make things even better, another employee who was also adding thingamajigs, Charlie, can also use the same library and achieves similar results, also taking about 30 minutes to add one thingamajig. So now Charlie can actually handle the full load of thingamajig additions, leaving Bob to move on to other things.

Don’t do it again

The development of the new library took longer than expected, because Bob never done it before. This is the key: if you’ve done something before, and so you think you have an idea of the work involved in doing it again, this may be a “smell” – a hint that something is wrong. It should light a bulb in your mind: “If I’ve done this before, then maybe I should be abstracting it rather than doing almost the same thing again!”

You could say, in a way, that the best software engineers are the ones that have no idea what they’re doing or how long it will take. If they knew what they were doing, it means they’ve done it before. And if they’ve done it before then they’re almost by definition no longer doing it – because the best software engineers will stop repeating predictable tasks and instead get the machine to repeat it for them1.

Adding and Multiplying

In case you missed the link to adding and multiplying, let’s explore that further. Let’s assign a monetary value to the act of adding a thingamajig. As direct added value to the customer, let’s say the task is worth $10k, to pick a nice round number ($1k of that goes to Alice, and the rest goes to running expenses of the company, such as paying for advertising). Every time Alice completed the task, which took her a week, she added $10k of value. This means that Alice was adding productive value to the company at a rate of $250 per hour.

Now Bob doesn’t primarily add value by doing thingamajigs himself, but instead develops a system that reduces an otherwise 40 hour task to 30 minutes. After that, every time a thingamajig is added, by anyone, $10k of value is added in 30 minutes. Bob has multiplied the productivity of thingamajig-adders by 80 times. In a couple more weeks, Bob would be able to add more value to the company than Alice did during her entire career2.

Is it unrealistic?

The short answer is “no”. Although the numbers are made up, the world is full of productivity multipliers, and you could be one of them. Perhaps most multipliers don’t add 7900% value, but even a 20% value increase is a big difference worth striving for.

The laws of compound interest also apply here. If every week you increase 10 developers’ productivity by just 1%, then after 2 years you’d be adding the equivalent value of 6 extra developers’ work every day.

The alternative

What happens if Bob was never hired? Would the company crash?

Perhaps, but perhaps not. What might happen is that Microsoft, or some big open source community, would do the multiplying for you. They would release some fancy new framework that does thingamajigging even better than the way Bob did it, because they dedicate many more people to the task of developing the library. The company will take 5 years before they decide to start using the fancy new framework, in part because nobody on the team knew about it, and in part because they now have 250 thingamajigs to migrate and the expected risks are too high for management to accept. But in the end, most companies will catch on to new trends, even they lag behind and get trodden on by their competitors.

Final notes

In the real world, it’s hard to tell Alice from Bob. They’re probably working on completely different projects, or different parts of the same project, so they often can’t be directly compared.

From the outside it just looks like Bob is unreliable. He doesn’t finish things on time. A significant amount of his work is a failure, because he’s pushing the boundaries on the edge of what’s possible. The work that is a success contributes to other people’s success as much as his own, so he doesn’t appear any more productive relative to the team. He also isn’t missed when he leaves the company, because multiplication happens over time. When he leaves, all his previous multiplicative tools and frameworks are still effective, still echoing his past contributions to the company by multiplying other people’s work. Whereas when an adder leaves the company, things stop happening immediately.

Who do you want to be – an adder or a multiplier?


  1. This is not entirely true, since there is indeed some pattern when it comes to abstracting code to the next level, and those who have this mindset will be able to do it better. Tools that should come to mind are those such as the use of generics, template metaprogramming, proper macro programming, reflection, code generators, and domain specific languages 

  2. How much more do you think Bob should be paid for this? 

Sequences: Part 4 – Push and Pull

Sequences: Part 4 – Push and Pull

Welcome back! This week in our exploration of sequences we’re going to do something a little different. Up until now we’ve been looking at a single running example of a function that interacts with sequences, and progressively developing its interface and considering the consequences of each choice. But this week I’d like to delve a little deeper into the meaning of these strange concepts of push and pull, as they apply to sequences.

What is push and pull?

Wikipedia doesn’t talk directly about “push” and “pull” as coding patterns, but does talk about the related concept of a push technology, which it says is a

…style of Internet-based communication where the request for a given transaction is initiated by the publisher or central server.

There is also a corresponding Wikipedia entry for pull technology, where requests are initiated by the client.

So here it seems that these terms are, in a sense, used to say which side of communication interface has control, or initiates an action. If the server has control, and is the one to initiate or “cause” the transaction, then the server is pushing to the client.

We can apply the same principles to software design at a smaller scale. When your code calls a function, it could be said to be a client of that function. When your code uses an object, it could be said to be a client of that object. So our server-client relationship applies in these cases as well.

Pull

Consider the following simple piece of C code which might be used to print a sequence of values to the console:

initProducer();
for (int i = 0; i < numValues; i++) { 
    int value = produceValue();
    printf("Value %i: %i\n", i, value);
}

Loop Pull

This code uses the values from the sequence, so it’s a client of the sequence. It accesses items from the sequence by calling produceValue, which pulls the next item from the sequence. The pulled value is available to be used on the very next line of code, where in turn it’s pushed to the console output. Whatever is producing the sequences responds to the pull by providing the pulled values as they’re requested. This is the same iterator pattern that we’ve been looking at in past posts1.

Push

What happens if the producer of the sequence needs to provide these values by pushing them to the consumer?  Now our consumer code might look like this:

int i;

void initConsumer() {
    i = 0;
}

void consumeValue(int value) {
    printf("Value %i: %i\n", i, value);
    i++;
}

ActiveProducer

All the same elements of the code are here: the declaration of int i, the initialization of i to 0, printing the values to console by printf, and incrementing i after every value is consumed. The producer just has to call initConsumer before it starts producing values, and it has to call consumeValue every time it produces a new value. So here the producer is in control, and decides when new values are available.

Strangely enough, by writing the consumer in such a way that it doesn’t pull values, the producer code might look much like our original consumer code:

initConsumer();
for (int i = 0; i < numValues; i++) { 
    consumeValue(i * 10); // produces values 0, 10, 20, 30, ...
}

It’s clearly much easier to write producer code that pushes, and consumer code that pulls. But these two things seem to be at odds with each other – either the producer or the consumer is “in charge”, and the other one loses out. Many sequence-manipulating functions both produce and consume sequences, since they have an input and an output. It’s much easier to write these kinds of functions if they can pull from a data source, and push to a data sink.

C++ seems to deal with this by dividing “sequence processors” broadly into 2 categories: on the one hand we have containers (and their corresponding iterators) which are happy to be both pushed-to and pulled-from. On the other hand we have algorithms and application-specific functions, which both push and pull from containers in order to get their job done. These application-specific functions can be written more simply because now they don’t have to worry about being pushed-to or pulled-from.

C++ Pattern

It’s as if the containers act as peace-makers, mediating between all these functions that all want to be in charge, that push and pull the data whenever they find it most convenient. But as I’ve mentioned before, containers almost always come at a cost, and it would be better if they weren’t the default solution to the problem of manipulating sequences.

Harmony between Push and Pull

But consider what happens if we take our first example, and replace getValue with the getchar from the stdio library:

for (int i = 0; i < numValues; i++) {
    int value = getchar();
    printf("%i\n", value);
}

This code instead reads the values from the console2. This is still a pull, since the code appears to be actively causing the value to be acquired, and just like before, the value is available to be used on the very next line of code.

But yet we also know where these values ultimately come from: they come when the user pushes keys on the keyboard (this is a “push” in two senses of the word). How can it be that both sides of the interface think that they initiated the transaction?

Push and Pull Keys

In many ways I think it’s equally true that both sides did really initiate the transfer. This example shows that it’s possible for the parties on both sides of the interface to feel like they’re both in complete control, calling the shots and deciding when the next value should appear. In this particular case what’s happening is that the thread is suspended or blocked until the value becomes available.

Let’s step out of reality for a moment, and imagine a strange world with different laws of physics and time. Imagine that I decide I want to talk to my friend, and after dialing his number on my phone, the phone freezes me in time, without forwarding the call to my friend. There I am, suspended in time in my room for days, not knowing that the world is continuing around me. A few weeks later, of his own accord, my friend also spontaneously decides it would be a good time to give me a call. After dialing my number, time unfreezes for me, and I hear his voice as if no time had passed at all for me. To me, I feel like I called him, but to him, he feels like he called me. In fact both are true: we both initiated the call.

This is very convenient for both my friend and myself, since we both waited till the perfect moment to call each other. Neither of us were interrupted by “receiving” the other person’s call (possibly while we were in an inconvenient state, such as in the shower or on another call).

The same is true in the world of code. Code which pushes and pulls is much easier to write than code which is pulled-from or pushed-to. State management is much easier, because code that is pushed-to has to be prepared to be interrupted (hear: race-conditions and re-entrancy problems).

Just like freezing time is difficult in the real world, multithreading in the software is not easy either (although it’s obviously much easier to suspend thread-time than to suspend physical time!). But lets take a look at another example, this time in C#:

static void Main()
{
    foreach (int value in Values())
    {
        Console.WriteLine(value);
    }
}

static IEnumerable<int> Values()
{
    for (int i = 42; i < 52; i++)
        yield return i;
}

The example is simple. In our main function we have a loop that pulls values from a sequence of values returned from the Values function. The Values function which produces the data also has a loop, and pushes values the consumer (Main). Don’t let the keyword “return” deceive you into thinking that that the Values function is not pushing, since the act of returning is initiated by the called function, and the function progresses to the next line of code when the consumer comes back for the next value. The code would look pretty much the same if yield return i was instead produceValue(). Or to put it another way, the code would look horribly different if Values was actually just responding to pulls and not structured in such a way that it’s pushing.

Neither the Main function nor the Values function is more in control than the other – they are like a pair of coroutines, cooperatively yielding control to each other. Both are written as if they’re in control, which makes the code much simpler to write. And best of all, it does this without using any extra threads! This is the power of the generator – to be able to write functions  which produce values by pushing them, while their consumers are able to pull them. The best of both worlds.


  1. Except that the producer state is statically allocated, which may not be a good thing in general but it makes for a simpler example 

  2. Or other standard input