Author: Michael Hunter

Snapshotting vs Bundling

Snapshotting vs Bundling

TL;DR
Bunding and snapshotting are two different ways of packing a program for deployment. This post is a somewhat-biased overview of why snapshotting is clearly superior in many respects.


I recently wrote up an explanation of Microvium snapshotting, along with this hopefully-helpful animated diagram:

In short: the diagram depicts a running Microvium (JavaScript) application that is suspended and transferred to a new environment where it continues where it left off. For the typical use of Microvium, the original environment might be a development machine or back-end server, and the new environment might be a low-resource microcontroller.

I want to expand on snapshotting a bit more in this post and, in particular, I want to contrast it to an alternative: bundling.

Background

JavaScript doesn’t have a preprocessor as C does. I’ve argued before that this is a great thing: having all behavior represented the same way, in the same language, rather than needing a separate preprocessor language, template meta language, linker script language and makefile language, etc. It also means there is no distinction between compile-time and runtime: everything is runtime. It eliminates needless concepts. I like this ideal and I want to stick to it.

But if you don’t have #include, how do you access behavior declared in other source code files? (other modules)

In JavaScript, unfortunately, there are lots of different ways of importing other modules, and they don’t have consistent semantics. Today, let’s talk about probably the most common one, at least in node.js applications, which is the use of a function called require(). Importing a module like this means that you are using the commonjs system.

Bundling

A bundler is a way of pre-combining all the JavaScript files in your project (and other files, to some extent) into one (or a few) files for distribution. This is particularly needed in web applications, which could contain hundreds of thousands of files, but you don’t want the user’s web browser to have to download each of these files separately.

Bundling is not a passive operation like creating a zip file. A typical bunder will actually analyze your code to figure out what files in your file system it imports, and repeat this to transitively to pull in all the required dependencies.

Bundling sounds an awful lot like preprocessing or compilation — it’s a step that happens before runtime, that needs to statically analyze your code in order to produce a distributable output. Something smells funny here.

JavaScript is not a langauge designed for such static analysis and I argue that it never should be.

Ok, let’s look at a concrete example of a bundler: Webpack.

In the following code, module a, which we’ll assume is the entry point, imports module b, and prints a value from it.

// a.js
const b = require('./b');
console.log(b.bValue);
// b.js
exports.bValue = 5;

This does what you expect it to do. Now let’s pack it with webpack1.

webpack --entry ./a

On my machine, this outputs a single js file, which when run, correctly prints the value 5 to the console.

Ok, but, how did it know to bundle module b into the result, so that it actually runs correctly?

The answer appears to be: it makes an educated guess by looking at the source code and seeing that it calls require with an argument of ./b.

This sounds so dodgy. But I guess that people had JavaScript that needed bundling, and bundlers seemed like the only way to fix it.

It’s easy to mess this process up. Let’s change a to the following code which does the same thing as before if evaluated directly in node.js, but which causes completely different Webpack behavior:

// a.js
const b = require('./' + 'b'[0]);
console.log(b.bValue);

What does webpack do with this?

Err, well, the resulting bundle still correctly outputs 5 to the console when executed, but now the bundle is 105 kB instead of 5 kB, at least on my machine. Why? Webpack has clearly given up trying to figure out statically what I’m importing and has instead included everything in the folder as part of the bundle, which in my case happens to include both the output files and webpack itself (so Webpack is packing itself into the bundle!).

But, it gets worse. Let’s change module a to the following code, which again does the same thing when run in node.js, but causes completely different webpack behavior:

// a.js
const b = arguments[1]('./b');
console.log(b.bValue);

This source runs in node because require is actually a parameter passed to the commonjs module code and it so happens to be the second parameter.

Webpack doesn’t see require at all in this source, so it assumes it doesn’t need to include anything else in the bundle. Consequently, the bundle throws an exception when run in node.js.

Sure, this example here is abusive towards Webpack — I’m picking pathological cases to expose Webpack’s Achilles’ heel.

A better solution: snapshotting

It’s my blog; I’m allowed to make bold, opinionated claims about what things are “better” than others. Snapshotting is better than bundling.

A Microvium snapshot fulfills the same purpose as a Webpack bundle, in the sense that the snapshot is a highly-efficient representation of all the resources that the JavaScript application needs in order to run in a future environment. But it’s better than a bundle because:

  1. It does not require a third-party tool to make it
  2. It does not depend on shady guesswork about what’s being imported. Rather, the application code actually runs and performs its imports using the full semantics of the language.
  3. It also does not require any declarative representation of what should or should not be included in the bundle, ala tsconfig.json or webpack.config.json. I really dislike these external declarative files, because they add needless complexity to the application.

Let’s get concrete, translating the same example as earlier but to leverage Microvium snapshotting:

// a.js
import * as b from './b';
import ffi from 'ffi';

ffi.export('restore', whenRestored);
function whenRestored() {
  console.log(b.bValue);
}
// b.js
export const bValue = 5;

When run on a suitable host, the above will print 5 to the console, the same as the Webpack equivalent does. It does this by performing the imports at runtime before snapshot is taken, and then the snapshot can be deployed to an environment on which importing might not even be supported at all.

Let me see if I can draw another diagram to illustrate this:

Application imports dependencies (e.g. other modules) and then moves to a new environment

Please note that the ES6 module import syntax is not the reason why this works. It will work equally well with dynamic import() when that’s been implemented. In fact, there are working test cases that interact with resources directly using node’s fs module.

The pattern isn’t new

Web developers are already familiar with the idea of two distinct phases of runtime execution: there is before the page has loaded, and there is after the page has loaded. It is the same application executing across both phases, but the application has access to different resources after the page has loaded (for example, all the DOM elements and images).

To run part of your application after the page has loaded, you simply subscribe a callback to the load event:

window.addEventListener('load', whenLoaded);
function whenLoaded() {
  console.log('The page is fully loaded');
}

The similarity between this and Microvium example should be clear. Snapshotting is just the mechanism that is used to transfer the running application from the development/build environment to where it will run on the user’s device.


  1. Assuming you’ve installed webpack globally 

Microvium Modules
Design thoughts

Microvium Modules
Design thoughts

Today, I completed the first working version of the module system for Microvium, and I want to share some of the design considerations and the conclusions I came to. The design as it stands is the result of quite a few iterations. It’s is likely to change again in the future as we iterate on Microvium before its first proper release. Any feedback and suggestions are welcome.

This will be a long and technical post. Most of my readers will probably want to skip this one. There’s also a more succinct overview of the module API available here:

https://github.com/coder-mike/microvium/blob/master/doc/js-host/module-api.md

There is some overlap between what Microvium achieves and what SES Compartments achieve, in the sense that both create isolated environments that allow JavaScript code1 running inside the respective environments to gain access to other modules through configurable mechanisms (e.g. hooks) specified from outside the environment (e.g. in the host). So naturally, some of the inspiration for the Microvium API has come from the Compartment API. But I’ve also diverged somewhat from some of the decisions made for the Compartment API to date and made different trade-offs, and I wanted to informally document my thought process.

Note that modules are only a consideration when running Microvium on a desktop host, for example in node.js, so the API I’m referring to here is the JavaScript host Microvium API. On the microcontroller host, there is neither the ability to import modules nor parse source text, and interaction with the firmware host is not done through modules for efficiency reasons.

Design Objectives

I want the design of the Microvium API to be clean and simple to understand and use. Bear in mind that, at least initially, the target users for microvum are probably firmware developers who are looking at potentially adopting Microvium as a new and foreign third-party tool. This is different from the target users for Compartments, who are likely to already be entrenched in the JavaScript ecosystem, and when confronted with SES and Compartments will be learning about a standard feature of JavaScript rather than a foreign, third-party tool.

Simplity, simplicity, simplicity!

Modules are a complicated topic when you start digging into the details of circular dependencies, caching, relative specifiers, resolvers, etc. So it’s been quite a challenge to create an API that abstracts all of that while still remaining true to the specification!

As part of keeping things simple, I want the API to necessitate the understanding of as few new concepts to the user as possible when explaining microvium (e.g. in the documentation). As I go through the design in the rest of this post, I’ll highlight some of the areas where I’ve removed the need for the user to understand various concepts by removing those concepts from the API design completely. My favorite kind of design work is all about removing and simplifying.

Some other objectives:

  • To remain independent of the file system or any particular module specifier scheme
  • To provide complete and perfect isolation and sandboxing of the app running inside the Microvium virtual machine.
  • To have interoperability between node modules and Microvium modules, where desired
  • To remain independent of the particular caching (memoization) policy used for modules, while remaining correct according to ECMAScript specification requirements for module caching.

The best way to lead you through the design is by example. I’ll start with the simplest examples and then build them up in complexity, explaining the thinking and design choices as I go.

Examples

The following are working2 examples, but bear in mind that the Microvium implementation is, in general, probably still less than half complete, so don’t expect to generalize too much.

Example: No Modules (One Module)

I’ll start with an example without any imports or exports. For a user that doesn’t need support for multiple modules, the module system should not add any cognitive load.

A simple hello-world in Microvium is as follows:

import Microvium from 'microvium';

// Let's say we have this source text
const sourceText = `print('Hello, World!')`;

// Create a new virtual machine
const vm = Microvium.create();

// The virtual machine is going to need a "print" function
vm.globalThis.print = console.log;

// Import the source text into the virtual machine
vm.evaluateModule({ sourceText });

(Be aware that the above code is evaluated prior to running on the target device)

There are already a couple of interesting things to note here:

  • Globals are provided by mutating globalThis. Internally this is implemented as a proxy whose properties represent the global variables of the virtual machine. For users who don’t care about dividing code into modules, providing things through globals is quite powerful. I’ve kept the name consistent with the Compartment API.
  • In contrast with the SES Compartment API, the Microvium create function does not accept a set of endowments. This is for the practical reason that many globals in Microvium are likely to refer to objects created inside the VM itself (otherwise they wouldn’t appear in the snapshot), and so can’t be created until the VM exists to create them in.
  • The SES Compartment API has an evaluate function which could be used to evaluate JavaScript “script” text (as opposed to module text). In Microvium, for simplicity, I decided that all source text should be interpreted using module semantics, so there is no evaluate function. This eliminates one point of confusion for Microvium users since there is no need to understand “script” vs “module” (limiting the number of new concepts for non-JS users to absorb). This is only possible because Microvium is a completely new tool, and the language it supports is already a subset of full ECMAScript.
  • Observant readers will notice that Microvium.evaluateModule() actually accepts an object and not a string of source code. The reason for this will become more clear later, but it doesn’t add a significant amount of cognitive overhead in simple cases like this.

I’m not sure what the evaluateModule method should be named — I may change this in future, and I’ve changed it about 10 times since I started doing the design. It accepts source text and returns a module object (i.e. module namespace object) representing the exports of the module source text. This is in contrast to any of the Compartment methods which only accept module specifiers3.

There are multiple reasons I chose not to expose a method that receives a module specifier. I’ll elaborate as we go through more examples, but here are two reasons:

Firstly, it would mean that the user would need to provide multiple implementation points, just to get a single module running in the VM. The Compartment API requires that the user provide an importHook and a resolveHook to the compartment constructor prior to being able to process imports, so that the internal Compartment machinery knows what to do with the specifier. This may be more acceptable in the context of Compartments, since the evaluate function at least provides a starting point for new users to get familiar with Compartments without needing to understand all the concepts like “what is a hook?”, “what is resolving?”, “why is resolving different to importing?” etc. As I say, in Microvium, I wanted to introduce as few new concepts as possible, while still retaining the same generality if possible.

The signature for the evaluateModule method is as follows:

interface Microvium {
  /**
   * Imports the given source text as a module.
   *
   * Returns the module namespace object for the imported 
   * module: an objects whose properties are the exports of 
   * the module.
   */
  evaluateModule(moduleSource: ModuleSource): ModuleObject;

  // ...
}

(I’ve used the name ModuleObject because it is an object representing the module. The Node.js documentation just calls the analogous thing “the module” — the thing returned by require — but I wanted to disambiguate the instantiated module from the source text of the module, both of which are in some respects “modules”).

I find this aspect of the design reasonably elegant — it accepts source code as input, and gives you back a module as output. All the intermediate steps are abstracted away, which I think is beautiful. The only concepts that the user needs to understand here are the existence of the source code and that of the resulting module, which are both well-understood concepts. In particular, I’ll note that this example (so far) has no dependence on the following concepts:

  • Hooks
  • Specifiers
  • Resolvers, loaders, etc.
  • Files and file systems

It’s important to me that this is the case, since the demonstration of Microvium with a “Hello, World” example should not overwhelm the user with new concepts.

The name evaluateModule is similar in principle to the Compartment API’s evaluate method, but works on module source text instead of script source text.

Example: Single Export

It follows naturally, and probably without need for explanation, that a module can have exports which are accessible to the host:

const sourceText = `export const x = 5`;
const { x } = vm.evaluateModule({ sourceText });
console.log(x); // 5

Again, this is all very natural, and we haven’t yet needed to add any new concepts into the user’s mental model.

One point I’ll touch on briefly is the fact that these kinds of direct object references between the host and the virtual machine are severed in a snapshot — a native C host has no way of accessing the module object of a module in a VM restored from a snapshot. Exports to a C host, or any host that resumes a VM from a snapshot (e.g. in node.js using Microvium.restore()), are done through a separate mechanism (see the now-slightly-outdated getting-started guide for examples of native exports).

I did attempt at one stage to consolidate the ES module system with the Microvium native FFI system, but came to the conclusion that it wasn’t reasonable to do, since it would require a lot of complexity and overhead with C hosts.

Example: Single Import

Now let’s say that we have a module that imports another module:

const sourceText = `import x from 'y'`;
// The following will throw "Module not found: 'y'"
vm.evaluateModule({ sourceText });

The above value for sourceText is a string of Microvium code which attempts to import the default export from the module identified by the module specifier 'y' and binds it to the name x. It doesn’t do anything with x, but that’s fine for illustration for the moment.

So far in this example, we haven’t yet provided any mechanism for the VM to actually perform the import, so it will throw to say that it can’t find the module4. Remember that Microvium is implemented independently from the file system, or any particular method of understanding module specifiers — they could refer to files, web URLS, or database IDs.

So, how do we provide the module 'y' to the VM so that the script5 can import it?

A challenge with modules is that the specifiers in general can be relative to the importing module. Consider in node.js that a module could import '../m', which is importing the file named m in the directory that is the parent of the directory containing the module performing the import. So, we can’t just give the VM a global resolution that says that the specifier 'y' maps to some particular module, since it may map to different modules depending on the context and the specifier scheme being used. We need to do it in a way that allows the specifier 'y' to change meaning depending on which module in the VM is doing the import.

This is another point where SES compartments and Microvium diverge. With compartments, the Compartment constructor accepts a resolveHook as one of its options. The resolve hook interprets the meaning of a specifier. For example, it may search the file system to find the appropriate module source text (but not load it, yet).

The resolve hook for a compartment is a function which accepts a module specifier and a referrer string and returns a string that absolutely identifies the module6. The referrer string identifies the module doing the importing so that the resolver can produce different results depending on which module is performing the import.

For Microvium, I think it’s unnecessary complexity to have the idea of an intermediate “thing” for users of the API to understand, which is that of the resolved/full module identifier (and also the “referrer”). It’s not easy to explain what this is even. In the case of node.js, it’s probably a normalized, absolute file path to the module source text file. In the case of the web, it might be a URL that identifies where the script should be downloaded from. I’d prefer not to have to explain this in the Microvium docs, so I’ve deleted the concept from the API design.

Example: importDependency

The solution I came up with in Microvium is to pass a callback into the evaluateModule method itself. Let’s look at another example, here with code that doesn’t throw:

const sourceText = `
  import { x } from 'y';
  print(x); // 10
`;
const moduleY = { x : 10 };

vm.evaluateModule({ 
  sourceText,
  importDependency: specifier => moduleY 
});

I’ll start with the disclaimer that the techniques used in these examples would likely not be used in production software. We are still in the realm of “demonstrating the public API of the virtual machine so that it can be understood“, and not “demonstrating the typical/recommended way to use the API“.

In our example, of course, we’re ignoring the specifier and always returning the module moduleY. This would only be correct behavior if you intended all module specifiers to alias the module y, but hopefully it’s obvious how the example could be extended to use the specifier to discriminate between multiple possible module objects, possibly by looking in a table of available modules, or by iterating the file system (more on that in a moment).

To understand this example, we need to introduce two new concepts to the user7:

  1. Module specifier: This is not a “new” concept for JavaScript programmers, but might be for C programmers, since it is a concept distinct from “file path”. However, it’s easy enough to explain that the module specifier is “the string text that appears after from in an import statement”.
  2. importDependency

I’m quite conflicted about the name “importDependency” here — any suggestions are welcome as to alternative terms to use. The importDependency callback8 is a function that takes a module specifier and returns the corresponding module object, in the context of the importing module.

This feels reasonably elegant and intuitive to me. Whenever we import a module, we are also telling the VM how to import nested dependencies relative to that module. It’s completely abstracted from the location of those nested dependencies, the mapping of module specifiers to the corresponding modules, and the mechanism by which they are retrieved and cached (more on caching later).

I’ll note here that the module object returned in the example by the importDependency callback is actually an object residing in the host, not the VM. The VM will interact with the host object through something like a proxy. This kind of module is analogous to the node.js core modules, which the node documentation says are “compiled into the binary” — i.e. they are baked into the host rather than being implemented as part of the application, but appear to the application as local objects. (Tying the analogy to our example, the host is the node.js code and the application is the Microvium script). It would be fine for the importDependency callback to return a VM object as well (of which a VM module is a special case), and the membrane would handle it the natural way (more on this later).

It might be worth highlighting at this point that the module that importDependency returns is just a normal/ordinary object. In the ES spec, module namespace objects are exotic objects with special internal properties that are inaccessible to user code. I don’t know if I’m violating the spec by allowing the reified module namespace object to be any object, but it certainly makes the API a lot easier to understand and use.

I want to emphasize that, so far, we’ve been able to demonstrate the module system of Microvium without requiring multiple files. Each example has just been a few lines of JavaScript code. I personally think that this is a huge plus when it comes to attracting newcomers to a new tool — being able to demonstrate something as complex as multi-module code without actually requiring multiple files in the examples. This is particularly useful for explaining this API with snippets like is done in this post. Of course, it naturally raises the question in the user’s mind of what to do with actual Microvium source code files on the user’s hard drive, but I like the fact that these concepts can be introduced independently of each other. A user familiar with node.js could quite easily see how to read a source file from disk instead of using a literal string if they so desired (although there are better ways, which I’ll introduce in a moment).

The benefits go beyond communication. Being able to treat arbitrary objects as modules makes it easy to programmatically synthesize modules on demand. An example of this would be an attenuated version of an existing module. Depending on whether the attenuation should exist inside the VM or outside, the attenuator could either produce a new module object or code-generated Microvium source code (which is in turn used to get a new module object).

I really like the fact that the relative nature of specifiers is captured by the callback being per-module. This eliminates the need for the concept of some kind of “full specifier”, e.g. an absolute file path or URL, or something like that. Of course, in most production projects, there may indeed be full specifiers existing under the hood somewhere, but I like the fact that the surface area of the VM API is independent of what this specifier looks like, if it exists at all. Full specifiers need not exist, or, if they exist, they could exist in any form, such as using a composite object as an absolute specifier (e.g. a tuple composing a database name with a record ID, if the module code is stored in a database). This is not to say that any particular method is advised, but rather to say that the VM is abstracted and decoupled from the choice.

Example: Importing Source Text

Importing source text instead of a host module introduces nothing new. It just involves putting together the machinery we’ve already covered so far in the post:

const moduleACode = `
  import { b } from 'moduleB';
  print(b); // 10
`;
const moduleBCode = `export const b = 10`;
// Import moduleACode, which will import module B
vm.evaluateModule({ 
  sourceText: moduleACode,
  importDependency: specifier => {
    if (specifier === 'moduleB') {
      return vm.evaluateModule({
        sourceText: moduleBCode
      })
  }
});

Nothing new here, but the nesting is starting to look ugly. Let’s refactor to clean up the nesting, and we can up the game a bit while we’re at it by adding a circular dependency…

Circular Dependencies and Module Caching

In the following code, I’ve created a modules “table” with two modules — moduleA and moduleB, and an importDependency function which is shared by both the modules and the bootstrap code. So, importDependency here is treated as a universal way of importing modules, with no support for relative paths (after all, we can make up whatever resolution rules we want).

const importDependency = specifier => 
  modules[specifier] && vm.evaluateModule(modules[specifier]);

const modules = {
  'moduleA': {
    sourceText: `
      import { b, printA } from 'moduleB';
      export const a = 5;
      print(b); // 10
      printA(); // 5
    `,
    importDependency
  },
  'moduleB': {
    sourceText: `
      import { a } from 'moduleA';
      export const b = 10;
      export const printA = () => print(a); 
    `,
    importDependency
  },
};

// Import moduleA which will import moduleB
importDependency('moduleA');

You may be surprised to see that this doesn’t result in an infinitely recursive cycle of loading modules. This leads me to the point of this section: module caching.

Node.js talks about this concept of caching modules. The term “caching” may be better described as “memoization”, since it is not an optional cache for performance reasons but rather something that is required in order to implement the described semantics: multiple calls to require will not cause the module code to be executed multiple times, and the same module object will be returned each time9.

Node.js caches the modules based on the full file path, but the ECMAScript standard doesn’t require caching to the same level. The spec says (emphasis added):

Each time is called with a specific referencingScriptOrModulespecifier pair as arguments it must return the same Module Record instance if it completes normally.

Multiple different referencingScriptOrModulespecifier pairs may map to the same Module Record instance. The actual mapping semantic is implementation-defined but typically a normalization process is applied to specifier as part of the mapping process. A typical normalization process would include actions such as alphabetic case folding and expansion of relative and abbreviated path specifiers.

https://tc39.es/ecma262/#sec-hostresolveimportedmodule

To translate this, my understanding is that multiple import statements within a single importing module must resolve to the same module, and multiple imports across the same or different modules are permitted to resolve to the same module, using some host-defined normalization process such as resolving absolute file paths.

Microvium implements the ECMAScript requirement by essentially consolidating multiple imports of the same specifier within a module into a single physical import entry. This does not require a cache at all.

Microvium facilitates node-style module caching (e.g. caching based on file paths) by providing the guarantee that if the same exact ModuleSource object (by reference identity) is passed to evaluateModule, then the same module object will be returned.

This doesn’t feel as elegant to me as the rest of the API, but it does make some level of sense:

  • The evaluateModule method takes a ModuleSource object (the one with the sourceText property) and returns a ModuleObject. Iff the caller passes a distinct ModuleSource, they will receive a distinct ModuleObject. Put another way, iff they instantiate a new ModuleSource, they will get a corresponding new instantiation of the ModuleObject (by having the module source code executed again).
  • This solution is still completely decoupled from the representation that the user may choose for full specifiers, if they choose to use full specifiers at all.
  • The user code is free to control the caching however they choose, by simply caching the corresponding ModuleSource objects according to their caching scheme. Later, I’ll show an example with a node-style importer and its corresponding caching system.
  • A special case of the above point is that the user is able to have multiple module instantiations with the same “full specifier” (e.g. multiple instantiations of a module with the same file path), should they choose to do so. This was not a goal of the design, but is a side effect of the fact that the API is decoupled from the notion of “full specifier”.

nodeStyleImporter

Writing the dependency importer by hand is fine for advanced users and library writers, and is great for the above examples that explain the Microvium API, but most Microvium users probably just want to know how to use Microvium scripts the same way they would use node.js scripts, and for that, there’s the nodeStyleImporter, which is a function that creates an initial import hook given some options and a Microvium VM to import into:

function nodeStyleImporter(vm, options): ImportHook;

An “import hook” in this sense is just a mapping from specifier strings to module objects, something like the require function in node.js:

type ImportHook = specifier => ModuleObject | undefined;

Here’s an example usage:

const moduleOptions: ModuleOptions = {
  // Allow the importer to access the file system, but only for 
  // subdirectories of the specified project directory.
  accessFromFileSystem: 'subdir-only',

  // Specify the root directory of the project, from which
  // initial imports will be resolved
  basedir: 'my/project/directory',

  // A set of "core" modules: those which can be imported from
  // any Microvium module with the exact same absolute specifier
  coreModules: {
    // Example of core module implemented as VM source text
    'a-core-module': './a-core-module.mvms', 
    // Example of core module implemented by the host
    'another-core-module': require('a-module-in-the-host') 
  },

  // Allow Microvium modules to import `fs`, `http`, etc.
  allowNodeCoreModules: true,
};

const vm = Microvium.create();
const importer = nodeStyleImporter(vm, moduleOptions);
importer('./the-entry-module');

This example delegates the complex work of resolving, importing, and memoization policy to the nodeStyleImporter, which is an importer that uses the resolve npm package alongside node’s fs module to find and read the source files for execution.

The full example can be seen as a test case in the github repo. I won’t explain it here because at this point we’re no longer talking about the Microvium API and we’re talking about an optional utility that helps in a specific scenario. I show it here as a demonstration of what can be done using the described Microvium API.

The nodeStyleImporter implements the importing of modules from the file system, as well as allowing Microvium scripts to transparently import node modules by indirectly invoking require, giving the Microvium app access to npm modules, among other things.

Conclusions

I’m pretty pleased with this design. It’s still early in Microvium development, so I’m sure the API will change again before the release, but I’m pleased by where it is at the moment.

I think it achieves all the goals, of being simple, easy to use, minimize the number of unnecessary concepts, and being compliant with the ES spec. It facilitates the handling of circular dependencies and it isn’t coupled to the file system. It doesn’t dictate a particular caching/memoization scheme. It can deal transparently with modules loaded inside the VM as Microvium source text alongside those provided and implemented in the host.


  1. In the case Microvium, only a subset of JavaScript is supported 

  2. The examples hopefully work, but I haven’t actually tested them verbatim. 

  3. A module specifier is the string that comes after from in an import ... from ... statement 

  4. For the purposes of isolation, the error message is the same whether the module is unavailable or whether no import hook is provided at all, since the script should not be able to distinguish these scenarios 

  5. I use the word “script” generically in this post to refer to source text written in the Microvium language 

  6. I don’t think the SES spec yet specifies whether these identifiers are strings — I’ll be interested to see if this is specified 

  7. As you know, I’m very aware of how many concepts need to be introduced and how quickly we need to introduce them 

  8. I’m calling this a callback and not a hook, because it’s not hooking into existing behavior but rather implementing the required behavior 

  9. This is generally true, but there are some caveats that I won’t discuss here 

Microvium

Microvium

Recently, I started working on a bytecode compiler and virtual machine which I’ve currently called Microvium.

(For those who have been following me, you’ll know I’ve also been working on a full JavaScript compiler called MetalScript for a while. MetalScript is still in progress and probably will be for some time — it’s not a small undertaking. Microvium is not MetalScript)

Quite simply, the objective of Microvium is to provide a way to run small scripts on microcontrollers, particularly for very low-resource devices (for example, those with less than 2 kB of available RAM).

I chose the name “Microvium” because it sounds similar to “Micro VM” (the original name) but is less generic. In particular, there is already an npm module named microvm.

There are already solutions that do this, so why am I creating a new one?

Microvium will have its own unique approach and tradeoffs which will give it advantages and disadvantages compared with other existing solutions, making it suitable in different scenarios. I’ll be discussing these in more detail in upcoming posts, but briefly, the two main focuses for Microvium are:

  1. The ability to run part of my favorite language (JavaScript) on a tiny device. I currently have it running on a 16-bit MCU where the whole firmware and scripts are using 8 kB of ROM and about 500 B of RAM.
  2. Leveraging the MetalScript idea of suspending a VM on your desktop computer to have it resume later on the embedded device. The heavy lifting of parsing and importing can be done on the desktop, while the device can just continue with the “easy” stuff.

Early Prototype Released

Last week we released the first working prototype to npm. Check it out on Github:

https://github.com/coder-mike/microvium

This hardly counts as a “release”, since really it doesn’t run anything except the “Hello, World!” example, so don’t go download it just yet. Really, it was just a test run of the release process, and to get a sense of what it would look like to use it. The exercise was worthwhile since it resulted in a few changes to Microvium to make it easier to use, and to simplify the concepts.

Subscribe to my blog to get more updates and stay tuned for an actual release, hopefully in the not-to-distant future.

Node’s require doesn’t always return the same value

Node’s require doesn’t always return the same value

This is just a curious edge case in node.js I came across while looking at creating the module system for microvium.

The node.js documentation says:

… every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.

But this doesn’t seem entirely true. In the following code, a module imports itself twice, getting a different object each time:

// script.js 

const a = {};
const b = {};

module.exports = a;
const a2 = require('./script');
module.exports = b;
const a3 = require('./script');
console.log(a2 === a3); // false

The challenge of C/C++firmware libraries

The challenge of C/C++firmware libraries

One of the statements about MetalScript that people seemed to disagree with is the idea that it can take as long as a week to integrate a C/C++ library into a firmware project.

TL;DR I’ve spent many years professionally doing both JavaScript and C/C++ firmware. In my experience, integrating and becoming familiar with a C/++ firmware library can take often days or in some cases weeks in some cases, while a JavaScript library often takes minutes to use.

Let me first say that if you are an expert in C and/or C++ firmware and have chosen not to use JavaScript for any major projects, then you may not be the person who will want to use MetalScript. You have spent years mastering a complicated craft, and although you have probably seen or worked with people who use JavaScript, you’ve chosen to stick with C/C++ because you probably believe it’s better. You may even look down on people who program in JavaScript — “real men” know how to do their own memory management, avoid signed integer overflow, and use CRTP to write code that is both well-structured and performant at the same time.

If that describes you, then keep doing what you’re doing. MetalScript is not for you, it’s for people who love JavaScript and who want to write real firmware with it.

Let me also say this:

I am a C/C++ firmware developer, and have been programming firmware for the last 20 or so years.

Up until about 5 years ago, my impression was that JavaScript was a pretty poor choice of language for various reasons, until I actually learned it and used it in real projects. Now that I’ve actually used it, I’m converted to its merits1. I am qualified to compare C/C++ against JavaScript because I am proficient in both and have used both in many real-world projects.

The best thing about JavaScript is not the language itself. The language is good — it used to be pretty bad, but with ES6 and modern features, it is becoming a really good language to work in. But the thing that makes the JavaScript experience great is npm.

The npm package repository contains hundreds of thousands of packages for JavaScript. In addition to the packages themselves, there is a culture that drives useful conventions, such as:

  • Packages generally have their source code in GitHub
  • Packages generally have a readme file in the root written in markdown. And because everyone does it this way, npm and GitHub both display the readme on the main page for the package
  • The readme typically contains a brief description of what the package does, as well as how to install it (even though the installation process is almost always the same)
  • The readme often contains a set of examples to get you started
  • The readme often contains a set of options for advanced usage, or links to proper API documentation.

Not all packages will be this well presented, but the vast majority are, and it’s hard to understate the importance of this conformity. It means that finding and getting started with a completely new library can happen within just a few minutes. The relevant information is all upfront, and the examples are typically self-contained so they can be pasted right into your code and they “just work”.

To try to demonstrate my point, I’m going to compare some examples. It’s difficult to come up with fair examples because firmware libraries are not going to be typically found in npm. So I will do this with two different examples — one of them using actual libraries and one of them a made-up scenario.

Example 1: Calculating a CRC

For the first example, I’m going to try to calculate a CRC in both C++ and JavaScript. This is something that is well suited to a third-party library, and so I expect to find code that already does it in both JavaScript and C++.

This comparison will probably be the best possible case for a C/C++ library. A CRC can be calculated without any platform dependence or customization. It should be as simple as finding a function online that does it and pasting it into the code. Let’s see how we go.

JavaScript

I’ll start with JavaScript, and will then compare the experience in C/C++.

  • At 10:22 AM, I Google “npm crc calculation” (Note: I recorded these timestamps while doing it, but not while blogging, so as to minimize interference)
  • Look at first result – a package called crc on NPM — open the page
  • The first thing on the front page: the list of features for the library. Yes, this looks like what I want
  • The second thing on the front page: the command to install it. At 10:23 AM I run the command npm install crc in my project folder. I have gone from “thinking that a library might exist”, to successfully installing it, in about 1 minute.
  • The third thing on the front page: the example code to use it. At 10:23 AM (still), I create a test script with two lines of example code:
const crc = require('crc');
 
console.log(crc.crc32('hello').toString(16));

At 10:24 AM, I run the code – node test.js . It works.

But actually I didn’t want CRC-32, I wanted CCITT-16. I adjust the test code to crc.crc16ccitt(‘hello’) and it still works.

Finished by 10:25 AM — from imagining some functionality to having it integrated in 3 minutes. This is not unusual IMO, once you are familiar with the workflow and know where to expect everything by common convention.

In C

At 10:37 AM, the first thing I’m going to do is Google “CRC calculation in C”. There is no standard repository that I can search, so I’m open to anything on the internet.

I look at the first page of links. None of them jump out at me as what I’m looking for.

I look at the first link. Scrolling through it, I can see some diagrams, some code, and lots of writing. Perhaps if I want to understand CRCs, this is not a bad place to be. But really I would prefer it if someone else understood CRCs, and I just leverage their expertise.

Should I just copy-paste one of the example pieces of code? I read skim-read pieces of the document to try to get an idea of whether this is a bad idea or not. The fact that it says “bit by bit” as one of the headings makes me think that it’s leading the reader through the implementation and starting with a less-than-ideal implementation. Better not use that one.

What about the other code snippet they include in the article? It’s not clear what kind of CRC this is for. Should I read the article? Should I cut my losses and move to the next link? Should I copy-paste and hope this is the right one? Time is ticking, and this is a race.

10:41 AM. I cut my losses on this page and move to the next search result. It’s got code — that’s good. But again, it’s got a “simple example” and an “advanced example” — not examples of usage, but examples of CRC functions. What does simple and advanced mean? Does one do more stuff than the other? Is one more efficient than the other? Do I have to read the code to find out? Time is ticking.

Glancing through the code (now the 3rd and 4th pieces of somebody else’s code that I’ve had to look at), I see that the simple example doesn’t use a lookup table, and the advanced one does. Likely I’m on another educational page that’s trying to teach the reader about how to do CRCs.

Why is this kind of thing the first two search results? Surely people are more commonly wanting to use CRC code than to write and understand their own implementation? Does it say something about the culture of C that the top links on google for “C CRC calculation” are to help people to write their own implementation from scratch?

Maybe my search terms are poor. Maybe I should have used the term “ccitt16” in the search query. Maybe some other changes would also help? I remember that the next two search results are stack overflow questions — let me have a quick look at them before I go back to try other search terms.

10:45 AM. Third search result. This is a stack overflow question. He says:

I’ve created a function to calculate a CRC16 checksum, but it doesn’t seem to be outputting correct values,

(I’m skim reading because I’m in a race against the JavaScript guy who integrated a working library in a quarter of the time it’s taken me not to get anywhere)

The guy wants to fix his function. The top answer has a bunch of explanation that I don’t have time to read, and then some code that is prefaced with “so, your function might look like”. Those are not words that inspire confidence in me. It sounds like his goal is to help the questioner figure out where he went wrong, rather than writing production-quality code that many other people will depend on.

Should I look at the other answers? Should I abandon these search terms and try something else? Should I look at the other SO question?

Let me have a quick glance at the other SO question before deciding.

10:48 AM. Fourth search result. A stack overflow question. This guy says:
Since CRC is so widely used, I’m surprised by having a hard time finding CRC implementations in C.

Totally agree with ya bro.

The top answer provides a bunch of links. A lot more reading, and a lot more implementations to choose from. But now at least we’re getting somewhere.

I actually landed up picking the implementation in the second answer of the first SO question. Not because it was carefully considered as the best choice, but because I was in a rush and it had a couple of nice properties at a glance:

  • It was short, so I felt less intimidated
  • The answer was only prefaced with 2 sentences, so there wasn’t much reading for me to do
  • In one sentence I see the words crc16 CCITT
  • In the other sentence, I see the word “tested” and a link (the link makes it official! ?)

10:50 AM. I paste the code into a C file and write a main function to test it. 

int main() {
  const char* str = "hello";
  short crc = crc16(str, strlen(str))
  printf("%04x", crc)
}

10:53 AM. I try to compile it, but GCC is not in my environment path. This has nothing to do with the library, so let’s just pretend I compiled it and it worked.

It took me 15 minutes, as opposed to JavaScript’s 3 minutes.

Postmortem

Speed to find library

Most of the time spent in C was spent finding the library. While it’s true that this will generally be slower in C than in JS (since there is no common convention and central catalog of such libraries), finding the library will never take a whole week, so this small example doesn’t account for the majority of the time that I claimed it takes to get a C firmware library integrated.

Why is that? What’s special about the CRC example that makes it unrepresentative of the norm?

I think the answer is that if you are picking an example from the subset of libraries that work both in today’s JavaScript world (i.e. it will be intended for server or browser) and also the world of C/C++ firmware, you are actually left with a small collection of libraries which do not exhibit most of the complexities that arise in firmware, which biases the comparison. As I mentioned earlier, a CRC-calculating function is much easier to make platform-agnostic, and so should be the best possible candidate for a hassle-free library in C.

Speed to install/integrate

I’d say that both the C and JS versions in this example were pretty similar to integrate into a test script (representative of a larger application). However, from experience, I’d say that almost every npm library is just as easy to integrate as this CRC library, typically only taking a few minutes to get going with the basic examples. In JS, I believe this example is representative of the general experience.

In C however, we’ve picked an example that is trivial to integrate — just copy and paste.

Most C/C++ firmware libraries are integrated at the source code level because of the wide range of possible target architectures, and the source code often requires extensive customization or dependency implementation or port layers in order to get it to work for your particular setup (and reading through documentation to understand how to do that). To compound the issue, the reality is that many firmware compilers don’t support the full C/C++ spec, normally for performance or architecture reasons. These are not criticisms of C/C++ per se, but nevertheless, are part of the typical experience of using C/C++ in a firmware environment.

Confidence

How confident am I in the JS vs C library?

The C “library”, if we can call it that, is some code in a Stack Overflow answer that one guy wrote. His testing involves running it a few times and checking that it matched some online web page. I don’t feel great about that. Maybe it’s okay because other people reading it may have spotted the problem and put in a comment if there was something wrong with it.

On the other hand:

  • The npm crc package has been downloaded over a million times in the last 7 days. All those feet treading on the same path will harden that path. If there any bugs, they will be found quickly.
  • It comes standard with a suite unit tests, and both the GitHub and npm pages display that all the tests are passing.

Furthermore, the package manager allows me to quickly update my dependencies, to make sure that I get the latest bug fixes at any time.

Documentation and Ease of Use

I think this answers itself. There is no documentation and no example code with the C version — it is a code snippet in a SO answer, so what do you expect?

The JS version has exactly the documentation that you’d expect from most npm libraries — it is concise and describes the key things you need to know in order to use it. It doesn’t try to tell you the theory behind CRCs, or anything that isn’t directly relevant to being productive as quickly as possible.

Am I talking about the language or the package manager?

I’d like to just clarify something because I know this is going to be brought up. I say that I’m comparing C/C++ vs JavaScript but then go on talking about things that are not part of the language at all (Google, Stack Overflow, npm, culture and ecosystem). Is that valid?

Yes, I think this is valid. When you choose to develop in JavaScript or C/C++, you’re not just adopting a language. You’re adopting all the tools, community, and culture surrounding the language. Productivity is affected by all of these factors, and they come together as a whole. You can say all you want about how you think C++ is a better language if you think that, but at the end of the day, it’s about getting shit done, and the JavaScript “whole” is better for that then the C++ “whole”.

Example 2: A modem driver

Here, I’m picking an example that adds complexity more typical of firmware development, but the tradeoff is that this example isn’t real — it’s merely a vision I have for the future. I don’t think the current state of JavaScript firmware development is mature enough for this to be a reality today.

For this example, I will assume the following hypothetical scenario:

  • We have a product that has an MCU and a u-blox cellular modem
  • Due to a shortage of UARTs on the MCU, the product connects the modem to a UART extender
  • Objective: connect to the internet to send an HTTP POST, receive the response JSON, decode, and output to the message therein to the debug UART

Before I even start, if you are a firmware programmer, give a moment to think about how you would do this. If I contracted you to write firmware for a device that does this, how long would it take you to write?

The Vision

There are two domain-specific pieces of information that we absolutely need to specify somewhere in any firmware, no matter what language:

  1. Information describing the behavior we require, such as the fact that at startup we want to connect to the internet, POST a message, and print out the response
  2. Information describing the device configuration, such as the fact that we have a u-blox modem, and the fact that it is connected on the multiplexer, etc.

We can summarize the required behavior with the following hypothetical JavaScript code:

// app.js
import * as request from 'request-promise-native'; // third-party library to perform HTTP requests

export async function run(device) {
  await device.modem.connectToInternet();
  const reply = await request({
    url: 'http://my-service.com/test-url',
    method: 'POST',
    json: 'please give me a message to display'
  });
  console.log(reply);
}

We can summarize the required device configuration with the following hypothetical JavaScript code:

// device.js
import { UBloxModem } from 'ublox';
import { Max14830 } from 'max-14830'; // UART extender driver

export const uartExtenderI2C = mcu.i2c('G7');
export const uartExtender = new Max14830(uartExtenderI2C);
export const modemUart = uartExtender.uart(3);
export const modem = new UBloxModem(modemUart, 'LISA-U200');
export const debugUart = mcu.uart(2);
export const debugConsole = new UartConsole(debugUart, { baud: 115200 });

Then we also need some glue code:

// main.js
import * as device from './device';
import { run } from './app';

// The console we want to use for output messages
global.Console = device.debugConsole;

// The device to use for connecting to the internet
global.internet = device.modem.internet();

// Transition from compile time to runtime
mcu.start();

run();

Perhaps the reality won’t be so easy, and I’m oversimplifying it. But I can imagine getting a library like this off npm and being able to get working with it on a firmware device within a few a hours.

How long would it take to do the same thing in embedded C? Days, weeks, months?

Conclusion

In between writing this article and publishing it, I ran into another real-world example. I needed a modbus connection from my C firmware to my electron JavaScript application, and I was implementing both sides. The JavaScript side was working within an hour, as one would expect. The C side took days of implementing hundreds of lines of porting layer, managing states, and banging my head against the wall.

The reality is that JavaScript is simply a much more productive tool to use, and a large part of that is because of how easy it is to reuse third-party code and to share your own so that others can reuse. 


  1. Although if you want to use it for a real project these days, please use TypeScript so you can get static type checking 

Can you parse this?
JavaScript Corners

Can you parse this?
JavaScript Corners

What does the following JavaScript mean:

const x = await / +y; const z = await / +y;

Hint: it’s a trick question.

The answer depends on the context, as is demonstrated by the following snippet:

function foo() {
  const y = 10;
  const await = 5;
  const x = await / +y; const z = await / +y;
  console.log(x);
}
async function bar() {
  const y = 10;
  const x = await / +y; const z = await / +y;
  console.log(x);
}
foo(); // Prints 0.5
bar(); // Prints / +y; const z = await /10

Within the context of an async function, await is like a keyword, and the thing after await is considered to be an expression. In JavaScript, an expression that starts with forward-slash is a Regexp literal, and that literal ends with the next unescaped forward slash. The +y at the end then represents string concatenation, so both the regular expression and y are converted to strings, and the concatenated result string is "/ +y; const z = await /10".

This interpretation is easier to visualize if the syntax highlighting identifies and colorizes the respective parse tokens as follows:

Outside of the context of an async function, await is just a normal identifier and has no special meaning (this is important so that the introduction of the await syntax to the JavaScript language didn’t modify the meaning of existing JavaScript code which might have used await as a variable or parameter name).

If syntax highlighting was correct, as seen in the above images, the difference would be pretty obvious. Unfortunately, I needed to photo-shop the above images, since VS Code highlights both examples the same, and both incorrect:

MetalScript Progress Update – May 2019

MetalScript Progress Update – May 2019

It’s been a while since my last update! Unfortunately, this is because there’s not a lot of observable progress to speak of. But today, here’s another update for those who are interested. I’ll preface this by saying that this update is not particularly interesting.

TL;DR: my personal life has got in the way, and MetalScript has been more difficult than I anticipated.

Personal Life

Since I’m the only person working on MetalScript, and I’m working on it in my spare time, the progress on MetalScript is intrinsically dependent on the state of my personal life. So here’s the status on that. (Or skip to the technical section).

Firstly, I’m going through a divorce. It’s been emotionally taxing and takes time and money (and money = time, and emotional strain = loss of time).

Secondly, I left my traditional job in January. I’m still doing contract work, to lessen the blow on my savings accounts, but overall, working for money is a lesser part of my life now (at least until my savings run out). In theory, this will give me more time to work on MetalScript!

Thirdly, I’ve moved house and country. Kinda. I’ve done half of the move: I’ve moved out of my previous apartment in January, but I haven’t moved into anywhere else yet1. For the last 5 months or so, and for the foreseeable future, I’ve been living a nomadic life, jumping between staying with family and friends and Airbnbs on different continents2. I carry in my suitcases the possessions I really need, like, umm, my laptop, monitor, mechanical keyboard, Homepod, headphones, and VR headset3. Oh, and clothes and stuff. Most of my other possessions have been sold or donated to charity4 (or given away in the divorce).

Technical

My last progress update5 was in October. At that time, I had a working demo showing a small snippet of code compiling (see pages 22-26 of the presentation I made available in my October post). The summary of progress since then is that now MetalScript cannot compile anything at all, as I’m rewriting large parts of the symbolic interpreter. Most of the time I’ve spent on the project over the last 6 months has been in refactoring and progressing on the design concepts.

Limitations of the previous work

Given that it was a work-in-progress in October, there were some limitations to the kind of code that could be compiled. One such limitation is the fact that it only works when the output does not have multiple runtime functions — the symbolic interpreter did not yet have a way of passing runtime data between different runtime scopes (function scopes).

A related limitation is that the symbolic interpreter could not compile a program that required runtime heap allocations.

These are the challenges that I’ve been solving recently and will continue to do so over the next few months.

Design Work

The vast majority of my time on MetalScript over the last 6 months has been spent on design work (or what I call “thought-work”).

My design process is to journal extensively, to hash out different approaches and refine unclear concepts. In the MetalScript project, I’ve accumulated over a million characters across all the markdown files that make up my journalling history.

I used to be able to write code directly from the ideas in my head. Using a fuzzy language like English (or pseudocode, IMO) for describing behavior is for people who have fuzzy thoughts6. If your ideas are clear in your mind, then expressing them directly in code is trivial7.

With MetalScript, and other projects where I push the boundaries on my capabilities, I find that I’m less and less able to write code directly. A lot of the ideas I conceive are so foreign and vague to me, just an abstract notion or gut feeling, that I can’t even express them well in English, let alone in code. Hence why I write pages and pages of journaling (mostly English prose), often rehashing the same idea 10 or 20 times over the course of weeks or months, each time making it slightly clearer, until it reaches the clarity required to express it in code.

The thought-work I’m working through at the moment is related to the symbolic interpreter, and how it handles ambiguity in pointer values.

Moving to POD Types

Previously, a lot of the internal data models of the compiler used direct cross-references in their in-memory representation. For example, a jump instruction may contain a direct reference to the block to which it jumps, or an add operation may contain a direct reference to the operands.

While this is convenient when either looking up the reference (e.g. performing the jump) or in constructing the reference, it makes some things more difficult when it comes to ease of maintenance/programmability:

  • Data in this model cannot simply be dumped to the console or a file as a JSON, because JSON embeds the structures that are referenced, which may even be recursive.
  • Cloning structures with internal or cyclic references is more difficult.
  • When debugging, it’s a lot harder to visualize what’s going on without human-readable IDs (direct references have an implicit machine-readable ID which we call the “address”).
  • Related to debugging, if you need to save IL or VM state as a file (for debugging or just persistence), if there are internal references then you need to calculate ephemeral names for everything on the fly, in order to represent the cross-references in a persistable way.

Given the phase of the project, and the need for simplicity and debugability over performance, I decided to overhaul the data structures to be pure POD-types that don’t include internal cross-references. In this model, all cross-references are implemented as string or numeric IDs (depending on the context).

Simplified IL Model

The symbolic interpreter is a complicated beast, and to give it the best chance of success, it helps if the IL that it processes is as easy to interpret as possible. Earlier this year, I undertook a large sub-project to redesign and simplify the IL model (the set of IL operations and their semantics). The new IL model is able to express JavaScript source code in a form that is much easier to interpret, which makes downstream phases of the compiler simpler. Simplicity is critical when working on such a large and complicated project.

A change in my personal work process

For me, the thought-work (iterative journaling) is incredibly taxing, mentally and emotionally, due to its complexity, inherent uncertainty and lack of clear progress indicators. I find I burn out after a few hours in a day of doing such work.

The solution I’ve adopted recently is to split my work day: I spend a few hours on concept work, followed by the rest of the day on what I would characterize as easy grunt work, such as implementing the ECMAScript specification. For example, yesterday in my “easy time” I implemented object destructuring, so now the following JavaScript is compiling to IL:

const { foo } = require('./module');

For interest’s sake, this is what the IL looks like (the highlighted lines correspond to the destructuring, and the preceding lines are the function call to require):

    t0_0 = sys_op("resolveBinding", "require", false, undefined);
    t0_1 = sys_op("getValue", t0_0);
    t0_2 = sys_op("getBase", t0_0);
    t0_3 = sys_op("withBaseObject", t0_2);
    t0_4 = list_new();
    t0_5 = copy("./module");
    t0_6 = sys_op("getValue", t0_5);
    t0_7 : list_push(t0_4, t0_6);
    t0_8 = sys_op("throwIfNotCallable", t0_1);
    t0_9 = sys_op("call", t0_1, t0_3, t0_4);
    t0_10 = sys_op("getValue", t0_9);
    t0_11 = reg_get("context");
    t0_12 = record_get(t0_11, "lexicalEnvironment");
    t0_13 = sys_op("requireObjectCoercible", t0_10);
    t0_14 = list_new();
    t0_15 = sys_op("resolveBinding", "foo", false, t0_12);
    t0_16 = sys_op("getV", t0_10, "foo");
    t0_17 = sys_op("initializeReferencedBinding", t0_15, t0_16);
    t0_18 : list_push(t0_14, "foo");

This work methodology is going well for me so far, because it provides a balance between the sensation of tangible progress and the more mentally-taxing job of getting the fundamental principles in place.

What’s next?

  • I’m working on a module loader for CommonJS modules, so I can get some basic multi-module code running. There are some interesting challenges here. For example, the imported module is allowed to call MCU.start in its root scope, thus suspending the process mid-require.
  • I need to finalize the new design for the symbolic interpreter and get it working

  1. The world is surprisingly ill-adapted to people who have an email address but not a physical address. 

  2. I’ve even experienced an airport shower for the first time. I was surprised at how unpleasant it wasn’t — there was hot water, and space to put my luggage. But I did empty half a bottle of hand sanitizer on the floor before getting in because waterproof sandals are not something I own at the moment.  

  3. Actually, I only use the VR headset on flights — it’s great to disconnect from the real world and watch Netflix in a massive home theater in the virtual mountain-tops 

  4. “Bare-metal living”, you might say. 

  5. That is, my last published progress update. I’ve written several updates in the interim which I never published in the end because I felt they didn’t say anything substantial. 

  6. Or for those who can’t natively think in terms of the programming language and thus have to mentally translate between their primary (natural) language and their “secondary” (programming) language 

  7. Technical documentation as a form of inter-human communication is a different story. 

Hughes List

Hughes List

Inspired by Eric Lippert’s recent post on a data structure called a Hughes list, I thought I’d play around with it by writing my own equivalent in JavaScript.

Side note before I get started. For my usual readers who are following the progress of my JavaScript-to-native compiler (MetalScript), I’ll hopefully continue blogging about it soon. I’ve just been in the middle of an international move, along with some other projects, and haven’t had much time to work on MetalScript. My flight out of the country leaves in a few days, and then hopefully my schedule will open up and I’ll continue on it. For those who don’t know what I’m talking about but want to know more, also check out my last post where I link to a presentation on what MetalScript is, or any other posts in the category.

Getting back to the point of this post…

The key benefit I see about Hughes list structure is that it is a persistent (immutable) list that allows you to concatenate, prepend, and append to the list in O(1) time. It does so by lazily accumulating the operations to be performed until the list contents are actually needed, when it then finally performs a single O(n) pass to build the final “real” list (at least this is my interpretation of how it works).

For those who want to cut right to the chase, see this GitHub repo for the final code. It’s not a lot of code. test.mjs is the entry point, and hughes-list.mjs is the implementation of the data structure.

I suggest reading Eric’s series first for a full explanation of the underlying principles before comparing it to mine. I’m not here to explain it (which Eric has done perfectly) but rather to provide a twist and an implementation in a different programming language.

I deviated quite a lot from Eric’s implementation, in the following ways:

  • My Hughes list is built up by composing imperative procedures that each mutate a JavaScript array, rather than working with an underlying immutable SimpleList like Eric uses. This produces the same logically-immutable list structure and achieves similar performance characteristics but saves on implementing an additional SimpleList type.
  • I omitted the wrapper class. If you inspect the list in a debugger, the list will actually be the function, rather than containing the function. I wouldn’t do this in production code but it leads to a nice simple implementation here.
  • As a superficial detail, I used the traditional JavaScript function names push and unshift instead of append and push respectively.

Before jumping into the detail of the implementation, it’s probably best to take a look at how this list structure is used. These are the unit tests I have for it (see the full file for more context).

checkList(HL.empty                               , []);
checkList(HL.push(list1, 4)                      , [1, 2, 3,   4]);
checkList(HL.concat(list1, list2)                , [1, 2, 3,   4, 5, 6]);
checkList(HL.unshift(list1, 0)                   , [0,   1, 2, 3]);
checkList(HL.concat(HL.push(list1, 10), list2)   , [1, 2, 3,   10,   4, 5, 6]);
checkList(HL.concat(HL.unshift(list1, 10), list2), [10,   1, 2, 3,   4, 5, 6]);
checkList(HL.concat(list1, HL.push(list2, 10))   , [1, 2, 3,   4, 5, 6,   10]);
checkList(HL.concat(list1, HL.unshift(list2, 10)), [1, 2, 3,   10,   4, 5, 6]);

These should all look pretty obvious. If you’re not familiar with persistent data structures, the most important thing to point out here is that a call like HL.unshift(list1, 0) does not change list1, but rather returns a new list that is the like list1 but with 0 at the front. This is also good for me to highlight because I said that my implementation of Hughes list composes imperative procedures that mutate the underlying JavaScript array – so it’s worth getting your head around how this immutable list structure is implemented on top of array mutations.

My implementation of the Hughes list data structure comes out to something like the following:

export const empty = () => {};
export const single = x => xs => xs.push(x);
export const concat = (ls1, ls2) => xs => { ls1(xs); ls2(xs) };
export const push = (ls, x) => xs => { ls(xs); xs.push(x) };
export const unshift = (ls, x) => xs => { xs.push(x); ls(xs) };

The things to highlight above are:

  1. The only operation used on JavaScript arrays here is push, even in the implementation of unshift. This is the key benefit of the Hughes list. Depending on the implementation of arrays in the JavaScript engine, the native unshift on arrays may be O(n) (on an array of length n), while push is typically O(1).
  2. As previously noted, this implementation performs mutations, so rather than ls2(ls1(xs)), we do ls1(xs); ls2(xs). The former might have created a new array/list from ls1 that is passed to ls2, while the latter simply invokes both mutations one after the other on the same mutable array.
  3. You’ll notice that the implementation is backwards from Eric’s implementation, in that the definition of a list is a function that appends its items to the end of a given array, not the beginning. This again is just because appending an item to the end of an array in JavaScript is the most efficient.

I personally think this implementation still captures the essence of what makes the Hughes list structure useful, while also interplaying well with the king of JavaScript list-like structures, the array. The key effect is that it allows us to concatenate, prepend, or append items in any order with almost no cost, and let the list methods build the compound procedure such that the operations are performed in the optimal order for performance.

TC53 Meeting

TC53 Meeting

Those following my blog will know that I was in Boston last week for ECMA TC53.

ECMA is a standards group, and TC53 is the 53rd Technical Committee, being established to produce standards for JavaScript on wearables and other resource-constrained devices. Some of you may already be familiar with TC39, who are the creators of modern JavaScript as a standard, of which TC53 is in some sense an offshoot. This meeting in Boston was the inaugural meeting, and I was invited as a guest to participate and to present on MetalScript.

Thanks to Lizzie from Moddable for taking photos.

I’m sitting on the end of the table because the meeting started at 10:00 AM and I arrived at 9:59 AM, while everyone else arrived closer to when they were told to arrive (9:30 AM)1. I hurried into the room with everyone already sitting and no space for me until everyone shuffled up. Great start!

The MetalScript presentation went well, aside from some technical glitches that were out of my control. For anyone who wants to see it, here it is:

(Or here’s a PDF that includes the notes for each slide)

If you look at the presentation, you may notice that I actually have some bare bones code compiling, which I managed to get working just the week before the event. This is an amazing milestone, albeit somewhat hacked together, but I’ll talk more about this in an upcoming post.

In addition to the presentations and discussions, there were also a number of demos. Here’s a representative from Johnny Five giving a demonstration using Tessel 2 (there were also other demonstrations, but I’m including this one because it’s the only photo where I look like I’m paying attention).

Another notable player in the room was Moddable with their XS JavaScript virtual machine for embedded devices, and some really cool demos using JavaScript in home automation, rendering fluidly to graphics displays, and the ability to download “apps” to an embedded device by partially flashing it.

It was amazing to be in a room full of people who are actually in the same technical space as what I’m doing — in the general population of programmers, there isn’t a very high proportion of people in the JavaScript firmware space!

Of course, spending 50 hours in an airplane for 14 hours of meeting is no good unless I also go “touristing” and check out some of the sites in Boston. Take a look at Facebook to see more of that side2. I’m not interested in arts and history, and I spent most of my time after TC53 sleeping, so the photos are sparse. But the squirrels are cute!

 


  1. To be fair, I fully intended to be there at around 9:45, but misjudged the ridiculous line at Starbucks by my Airbnb and the traffic getting to the location downtown. Also I was told conflicting start times, and only realized 9:30 AM was one them about 5 minutes before arrived, while trying to find information about the venue. Excuses, excuses… 

  2. I haven’t posted on Facebook in years, but I’m thinking maybe I should get back into it 

MetalScript Progress — September 2018

MetalScript Progress — September 2018

A MetalScript progress update, for those who are interested. This one is all boring technical details — if this is your first time on the site, perhaps read something more interesting like what MetalScript is and the reason why I’m creating it. Or for those who like technical details, read on…

[Edit: Previously this had “November 2018” in the title, even though I wrote this in September — who knows what was going through my head when I wrote that.]

Stack Unwinding

To recap, last time I said I had my IL compiler compiling the following code file to IL and getting it to run in the virtual machine:

// blinky.js
MCU.start();
setInterval1

Phew! That’s quite a list. I must point out though, that this isn’t even comprehensive. Initializing the realm is a lot more work than implied by the little IHDR rectangle in diagram. Similarly for most other operations — I’ve only shown the highlights here.

The main point I want to convey is that even though MCU.start() appears to be the first thing executed when you look at the source code, there is actually a lot of VM work leading up to that point, and by the time the VM is suspended at MCU.start(), the VM stack is already 5 frames deep. It is at this point that the MCU hardware must take over — the entry point for the MCU reset interrupt vector is the continuation of the MCU.start() call2. But ideally we do not want to somehow translate the VM stack into the corresponding hardware stack at the point where the program starts. We could, but it would be significantly more difficult to compile this way.

Instead, the “unwinding” process that I’ve finished now translates the functions in the call stack to continuation passing style (https://en.wikipedia.org/wiki/Continuation-passing_style). These constructed continuation functions are just normal IL, whose behavior starts where the original function left off.

This isn’t as easy as it sounds. It has to work for all valid user code, including user code that might look like this:

function myFunc() {
  const thingsToDo = [];
  thingsToDo.push3;
  // (... add more things to do ...)
  while (thingsToDo.length) {
    const thingToDo = thingsToDo.shift();
    thingToDo();
  }
}
myFunc();

The challenge in the above code is that the continuation has to continue in the middle of the loop. It has to restore the state correctly, while resuming the old behavior on the second iteration of the loop (the first iteration is split between compile time and runtime, and the second occurs completely at runtime).

Symbolic Interpreter

The phase I’m working on now is what I call information threading which is done by something I call a symbolic interpreter. This is the part where the compiler tries to understand the code that’s been written, in the same way that a human might mentally follow through the code.

For example, in the blinky example, a human following through the code will know what the call to setInterval refers to — they know the state of the machine, they know that it contains a setInterval JavaScript function, and they know whether or not they’ve done something nasty to that function such as replacing it with their own wrapper that does something different to the standard setInterval function. So a human following the code would be able to predict with 100% accuracy what the code should do (and so can a machine).

But there are cases where you can’t predict the behavior exactly. For example, calling toggle. You can predict what function is called when you invoke toggle, but you don’t say whether it will turn the LED on or off — at different times the same line of code can have different effects, depending on previous runtime state. This just means that you need to record that state somewhere at runtime.

The symbolic interpreter exactly this. It keeps track of the state of the program over “time”, keeping track of what can and can’t be known at each point in the program. This information will later be used to emit an implementation of the source code in the target language (in this case LLVM IR).

I have a lot of the symbolic interpreter up and running. It’s currently able to fully interpret simple IO-free programs, meaning that if it’s given a script that doesn’t perform any IO (e.g. one that doesn’t read or write to GPIO for example), it can infer the behavior of the program completely.

But in the case of IO-free programs, there’s no observable behavior at all (observable behavior requires IO). Interesting things can only happen when you hit an IO boundary, such as turning on an LED, sending commands to a robotic arm, or writing “Hello, World!” to the console. So this is what I’m working on at the moment — IO boundaries for the symbolic interpreter.

What’s Left?

I’m currently working through the “Blinky” example. I want to get the blinky code going all the way through the compilation pipeline, as in the following diagram (blocks in green are complete).

I don’t think the type assignment phase will be difficult. Taking the output from the symbolic interpreter, it should be easy to have an algorithm choose some basic binary types. Likewise, I’m optimistic that the LLVM IR emitter will be easy, given the chosen binary types. I’m not referring to the complete implementations for any of these phases. I only need to reach a proof of concept for the blinky example first. This will give me a chance to go back and review the good and bad decisions I’ve made, before fleshing out each phase to fully support all cases.

There is a little bit of work not implied by the diagram above, such as leveraging LLVM to produce output object files, linking, and developing a minimal runtime (event loop and GC). But I think these will be relatively straight forward for the first iteration of POC.

The above work is all to get to a minimal POC where I can demonstrate end-to-end compilation of at least one input file. Then as I’ve said in previous posts, it will be somewhat longer to reach MVP.

News: TC53

I’m very pleased to say that I’m traveling to Boston4 next month to participate in the inaugural meeting of ECMA TC53, a new “technical committee” (TC) being established for JavaScript on “smart wearable systems and sensor-based devices”. I’ve been invited to speak a bit about MetalScript, among other things, and so I’m working hard to reach a good milestone before the meeting. I’m incredibly excited to go and be part of what will certainly be a group of highly intelligent people paving the way forward for a better future5.

 

 


  1. ) => gpio.pin('B4').toggle(), 500);

I said that next up was the “unwinding” phase — taking the suspended virtual machine state and unwinding the virtual stack and register states to create a unit of code that doesn’t require any stack or register information. I’m glad to say, this is complete!

Let me explain how it works. At the point where MCU.start() is invoked, there’s actually a bunch of things in the VM stack, as illustrated by the following diagram:

 

Most of this, unsurprisingly, is not user code. I’ve used blue to represent system functions (aka ECMAScript “abstract operations”), and red to represent user functions. Here’s a brief explanation of some of the system functions:

  • RunJobs – this is the entry point to an ECMA application and has the main event loop that we all know and love.
  • IHDR (InitializeHostDefinedRealm) – this creates all the “realm” objects, which includes things like the “Array” object and the “Object” object — things that are “built in”
  • EnqueueJob – RunJobs gets the JS source text and calls EnqueueJob to add a script evaluation job to the job queue (for each entry JavaScript file, although in most applications I’ve seen there’s only one such file, normally called “app.js” or something).
  • ScriptEvaluationJob – this is a job wrapper in the spec that is used to evaluate a previously-enqueued user script
  • Parse (ParseScript) – this parses the script and returns information required to evaluate it.
  • ScriptEvaluation – given information from the parsing stage, ScriptEvaluation sets up the context needed to run the main script code, and then runs it.
  • GDI (GlobalDeclarationInstantiation) — initializes variables and functions for the script (recall that JavaScript variables and functions are hoisted)
  • ScriptBody – then we finally get to executing the body of the script — the actual user code.
  • Call – The user code invokes MCU.start(), which is a JavaScript function call, and there are a number of system-level things that need to happen for calls, such as checking that the target is callable, deciding what this-value to use, etc. All this work is encapsulated in the “Call” system function ((There is a separation in the MetalScript compiler between IL functions and JS functions, and likewise between IL function calls and JS function calls. IL does not have or need closures, variadic functions, etc, and is used to represent both user code and ECMAScript “abstract operations”. 

  • It is exactly the continuation of the MCU.start() call. Unlike in C, there are no operations on the MCU that precede the user code. Operations like initializing the MCU clock source or copying .data into RAM are logically part of MCU.start() and can be parameterized and configured accordingly in future by options passed to the MCU.start() call. I’ve said this before, but I feel it’s important enough to mention again 

  • ) => MCU.start( 

  • Boston is 17,000 km from here in Melbourne — almost as far away as is possible without going into outer space. I’m going to be spending about 3 times longer on the airplane than in the meetings! 

  • In a dream I had about it a few nights ago, I could swear that Gandalf was one of the members of the committee.