Author: Michael Hunter

Microvium

Microvium

Recently, I started working on a bytecode compiler and virtual machine which I’ve currently called Microvium.

(For those who have been following me, you’ll know I’ve also been working on a full JavaScript compiler called MetalScript for a while. MetalScript is still in progress and probably will be for some time — it’s not a small undertaking. Microvium is not MetalScript)

Quite simply, the objective of Microvium is to provide a way to run small scripts on microcontrollers, particularly for very low-resource devices (for example, those with less than 2 kB of available RAM).

I chose the name “Microvium” because it sounds similar to “Micro VM” (the original name) but is less generic. In particular, there is already an npm module named microvm.

There are already solutions that do this, so why am I creating a new one?

Microvium will have its own unique approach and tradeoffs which will give it advantages and disadvantages compared with other existing solutions, making it suitable in different scenarios. I’ll be discussing these in more detail in upcoming posts, but briefly, the two main focuses for Microvium are:

  1. The ability to run part of my favorite language (JavaScript) on a tiny device. I currently have it running on a 16-bit MCU where the whole firmware and scripts are using 8 kB of ROM and about 500 B of RAM.
  2. Leveraging the MetalScript idea of suspending a VM on your desktop computer to have it resume later on the embedded device. The heavy lifting of parsing and importing can be done on the desktop, while the device can just continue with the “easy” stuff.

Early Prototype Released

Last week we released the first working prototype to npm. Check it out on Github:

https://github.com/coder-mike/microvium#microvium-microvm

This hardly counts as a “release”, since really it doesn’t run anything except the “Hello, World!” example, so don’t go download it just yet. Really, it was just a test run of the release process, and to get a sense of what it would look like to use it. The exercise was worthwhile since it resulted in a few changes to Microvium to make it easier to use, and to simplify the concepts.

Subscribe to my blog to get more updates and stay tuned for an actual release, hopefully in the not-to-distant future.

Node’s require doesn’t always return the same value

Node’s require doesn’t always return the same value

This is just a curious edge case in node.js I came across while looking at creating the module system for microvium.

The node.js documentation says:

… every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.

But this doesn’t seem entirely true. In the following code, a module imports itself twice, getting a different object each time:

// script.js 

const a = {};
const b = {};

module.exports = a;
const a2 = require('./script');
module.exports = b;
const a3 = require('./script');
console.log(a2 === a3); // false

The challenge of C/C++firmware libraries

The challenge of C/C++firmware libraries

One of the statements about MetalScript that people seemed to disagree with is the idea that it can take as long as a week to integrate a C/C++ library into a firmware project.

TL;DR I’ve spent many years professionally doing both JavaScript and C/C++ firmware. In my experience, integrating and becoming familiar with a C/++ firmware library can take often days or in some cases weeks in some cases, while a JavaScript library often takes minutes to use.

Let me first say that if you are an expert in C and/or C++ firmware and have chosen not to use JavaScript for any major projects, then you may not be the person who will want to use MetalScript. You have spent years mastering a complicated craft, and although you have probably seen or worked with people who use JavaScript, you’ve chosen to stick with C/C++ because you probably believe it’s better. You may even look down on people who program in JavaScript — “real men” know how to do their own memory management, avoid signed integer overflow, and use CRTP to write code that is both well-structured and performant at the same time.

If that describes you, then keep doing what you’re doing. MetalScript is not for you, it’s for people who love JavaScript and who want to write real firmware with it.

Let me also say this:

I am a C/C++ firmware developer, and have been programming firmware for the last 20 or so years.

Up until about 5 years ago, my impression was that JavaScript was a pretty poor choice of language for various reasons, until I actually learned it and used it in real projects. Now that I’ve actually used it, I’m converted to its merits1. I am qualified to compare C/C++ against JavaScript because I am proficient in both and have used both in many real-world projects.

The best thing about JavaScript is not the language itself. The language is good — it used to be pretty bad, but with ES6 and modern features, it is becoming a really good language to work in. But the thing that makes the JavaScript experience great is npm.

The npm package repository contains hundreds of thousands of packages for JavaScript. In addition to the packages themselves, there is a culture that drives useful conventions, such as:

  • Packages generally have their source code in GitHub
  • Packages generally have a readme file in the root written in markdown. And because everyone does it this way, npm and GitHub both display the readme on the main page for the package
  • The readme typically contains a brief description of what the package does, as well as how to install it (even though the installation process is almost always the same)
  • The readme often contains a set of examples to get you started
  • The readme often contains a set of options for advanced usage, or links to proper API documentation.

Not all packages will be this well presented, but the vast majority are, and it’s hard to understate the importance of this conformity. It means that finding and getting started with a completely new library can happen within just a few minutes. The relevant information is all upfront, and the examples are typically self-contained so they can be pasted right into your code and they “just work”.

To try to demonstrate my point, I’m going to compare some examples. It’s difficult to come up with fair examples because firmware libraries are not going to be typically found in npm. So I will do this with two different examples — one of them using actual libraries and one of them a made-up scenario.

Example 1: Calculating a CRC

For the first example, I’m going to try to calculate a CRC in both C++ and JavaScript. This is something that is well suited to a third-party library, and so I expect to find code that already does it in both JavaScript and C++.

This comparison will probably be the best possible case for a C/C++ library. A CRC can be calculated without any platform dependence or customization. It should be as simple as finding a function online that does it and pasting it into the code. Let’s see how we go.

JavaScript

I’ll start with JavaScript, and will then compare the experience in C/C++.

  • At 10:22 AM, I Google “npm crc calculation” (Note: I recorded these timestamps while doing it, but not while blogging, so as to minimize interference)
  • Look at first result – a package called crc on NPM — open the page
  • The first thing on the front page: the list of features for the library. Yes, this looks like what I want
  • The second thing on the front page: the command to install it. At 10:23 AM I run the command npm install crc in my project folder. I have gone from “thinking that a library might exist”, to successfully installing it, in about 1 minute.
  • The third thing on the front page: the example code to use it. At 10:23 AM (still), I create a test script with two lines of example code:
const crc = require('crc');
 
console.log(crc.crc32('hello').toString(16));

At 10:24 AM, I run the code – node test.js . It works.

But actually I didn’t want CRC-32, I wanted CCITT-16. I adjust the test code to crc.crc16ccitt(‘hello’) and it still works.

Finished by 10:25 AM — from imagining some functionality to having it integrated in 3 minutes. This is not unusual IMO, once you are familiar with the workflow and know where to expect everything by common convention.

In C

At 10:37 AM, the first thing I’m going to do is Google “CRC calculation in C”. There is no standard repository that I can search, so I’m open to anything on the internet.

I look at the first page of links. None of them jump out at me as what I’m looking for.

I look at the first link. Scrolling through it, I can see some diagrams, some code, and lots of writing. Perhaps if I want to understand CRCs, this is not a bad place to be. But really I would prefer it if someone else understood CRCs, and I just leverage their expertise.

Should I just copy-paste one of the example pieces of code? I read skim-read pieces of the document to try to get an idea of whether this is a bad idea or not. The fact that it says “bit by bit” as one of the headings makes me think that it’s leading the reader through the implementation and starting with a less-than-ideal implementation. Better not use that one.

What about the other code snippet they include in the article? It’s not clear what kind of CRC this is for. Should I read the article? Should I cut my losses and move to the next link? Should I copy-paste and hope this is the right one? Time is ticking, and this is a race.

10:41 AM. I cut my losses on this page and move to the next search result. It’s got code — that’s good. But again, it’s got a “simple example” and an “advanced example” — not examples of usage, but examples of CRC functions. What does simple and advanced mean? Does one do more stuff than the other? Is one more efficient than the other? Do I have to read the code to find out? Time is ticking.

Glancing through the code (now the 3rd and 4th pieces of somebody else’s code that I’ve had to look at), I see that the simple example doesn’t use a lookup table, and the advanced one does. Likely I’m on another educational page that’s trying to teach the reader about how to do CRCs.

Why is this kind of thing the first two search results? Surely people are more commonly wanting to use CRC code than to write and understand their own implementation? Does it say something about the culture of C that the top links on google for “C CRC calculation” are to help people to write their own implementation from scratch?

Maybe my search terms are poor. Maybe I should have used the term “ccitt16” in the search query. Maybe some other changes would also help? I remember that the next two search results are stack overflow questions — let me have a quick look at them before I go back to try other search terms.

10:45 AM. Third search result. This is a stack overflow question. He says:

I’ve created a function to calculate a CRC16 checksum, but it doesn’t seem to be outputting correct values,

(I’m skim reading because I’m in a race against the JavaScript guy who integrated a working library in a quarter of the time it’s taken me not to get anywhere)

The guy wants to fix his function. The top answer has a bunch of explanation that I don’t have time to read, and then some code that is prefaced with “so, your function might look like”. Those are not words that inspire confidence in me. It sounds like his goal is to help the questioner figure out where he went wrong, rather than writing production-quality code that many other people will depend on.

Should I look at the other answers? Should I abandon these search terms and try something else? Should I look at the other SO question?

Let me have a quick glance at the other SO question before deciding.

10:48 AM. Fourth search result. A stack overflow question. This guy says:
Since CRC is so widely used, I’m surprised by having a hard time finding CRC implementations in C.

Totally agree with ya bro.

The top answer provides a bunch of links. A lot more reading, and a lot more implementations to choose from. But now at least we’re getting somewhere.

I actually landed up picking the implementation in the second answer of the first SO question. Not because it was carefully considered as the best choice, but because I was in a rush and it had a couple of nice properties at a glance:

  • It was short, so I felt less intimidated
  • The answer was only prefaced with 2 sentences, so there wasn’t much reading for me to do
  • In one sentence I see the words crc16 CCITT
  • In the other sentence, I see the word “tested” and a link (the link makes it official! ?)

10:50 AM. I paste the code into a C file and write a main function to test it. 

int main() {
  const char* str = "hello";
  short crc = crc16(str, strlen(str))
  printf("%04x", crc)
}

10:53 AM. I try to compile it, but GCC is not in my environment path. This has nothing to do with the library, so let’s just pretend I compiled it and it worked.

It took me 15 minutes, as opposed to JavaScript’s 3 minutes.

Postmortem

Speed to find library

Most of the time spent in C was spent finding the library. While it’s true that this will generally be slower in C than in JS (since there is no common convention and central catalog of such libraries), finding the library will never take a whole week, so this small example doesn’t account for the majority of the time that I claimed it takes to get a C firmware library integrated.

Why is that? What’s special about the CRC example that makes it unrepresentative of the norm?

I think the answer is that if you are picking an example from the subset of libraries that work both in today’s JavaScript world (i.e. it will be intended for server or browser) and also the world of C/C++ firmware, you are actually left with a small collection of libraries which do not exhibit most of the complexities that arise in firmware, which biases the comparison. As I mentioned earlier, a CRC-calculating function is much easier to make platform-agnostic, and so should be the best possible candidate for a hassle-free library in C.

Speed to install/integrate

I’d say that both the C and JS versions in this example were pretty similar to integrate into a test script (representative of a larger application). However, from experience, I’d say that almost every npm library is just as easy to integrate as this CRC library, typically only taking a few minutes to get going with the basic examples. In JS, I believe this example is representative of the general experience.

In C however, we’ve picked an example that is trivial to integrate — just copy and paste.

Most C/C++ firmware libraries are integrated at the source code level because of the wide range of possible target architectures, and the source code often requires extensive customization or dependency implementation or port layers in order to get it to work for your particular setup (and reading through documentation to understand how to do that). To compound the issue, the reality is that many firmware compilers don’t support the full C/C++ spec, normally for performance or architecture reasons. These are not criticisms of C/C++ per se, but nevertheless, are part of the typical experience of using C/C++ in a firmware environment.

Confidence

How confident am I in the JS vs C library?

The C “library”, if we can call it that, is some code in a Stack Overflow answer that one guy wrote. His testing involves running it a few times and checking that it matched some online web page. I don’t feel great about that. Maybe it’s okay because other people reading it may have spotted the problem and put in a comment if there was something wrong with it.

On the other hand:

  • The npm crc package has been downloaded over a million times in the last 7 days. All those feet treading on the same path will harden that path. If there any bugs, they will be found quickly.
  • It comes standard with a suite unit tests, and both the GitHub and npm pages display that all the tests are passing.

Furthermore, the package manager allows me to quickly update my dependencies, to make sure that I get the latest bug fixes at any time.

Documentation and Ease of Use

I think this answers itself. There is no documentation and no example code with the C version — it is a code snippet in a SO answer, so what do you expect?

The JS version has exactly the documentation that you’d expect from most npm libraries — it is concise and describes the key things you need to know in order to use it. It doesn’t try to tell you the theory behind CRCs, or anything that isn’t directly relevant to being productive as quickly as possible.

Am I talking about the language or the package manager?

I’d like to just clarify something because I know this is going to be brought up. I say that I’m comparing C/C++ vs JavaScript but then go on talking about things that are not part of the language at all (Google, Stack Overflow, npm, culture and ecosystem). Is that valid?

Yes, I think this is valid. When you choose to develop in JavaScript or C/C++, you’re not just adopting a language. You’re adopting all the tools, community, and culture surrounding the language. Productivity is affected by all of these factors, and they come together as a whole. You can say all you want about how you think C++ is a better language if you think that, but at the end of the day, it’s about getting shit done, and the JavaScript “whole” is better for that then the C++ “whole”.

Example 2: A modem driver

Here, I’m picking an example that adds complexity more typical of firmware development, but the tradeoff is that this example isn’t real — it’s merely a vision I have for the future. I don’t think the current state of JavaScript firmware development is mature enough for this to be a reality today.

For this example, I will assume the following hypothetical scenario:

  • We have a product that has an MCU and a u-blox cellular modem
  • Due to a shortage of UARTs on the MCU, the product connects the modem to a UART extender
  • Objective: connect to the internet to send an HTTP POST, receive the response JSON, decode, and output to the message therein to the debug UART

Before I even start, if you are a firmware programmer, give a moment to think about how you would do this. If I contracted you to write firmware for a device that does this, how long would it take you to write?

The Vision

There are two domain-specific pieces of information that we absolutely need to specify somewhere in any firmware, no matter what language:

  1. Information describing the behavior we require, such as the fact that at startup we want to connect to the internet, POST a message, and print out the response
  2. Information describing the device configuration, such as the fact that we have a u-blox modem, and the fact that it is connected on the multiplexer, etc.

We can summarize the required behavior with the following hypothetical JavaScript code:

// app.js
import * as request from 'request-promise-native'; // third-party library to perform HTTP requests

export async function run(device) {
  await device.modem.connectToInternet();
  const reply = await request({
    url: 'http://my-service.com/test-url',
    method: 'POST',
    json: 'please give me a message to display'
  });
  console.log(reply);
}

We can summarize the required device configuration with the following hypothetical JavaScript code:

// device.js
import { UBloxModem } from 'ublox';
import { Max14830 } from 'max-14830'; // UART extender driver

export const uartExtenderI2C = mcu.i2c('G7');
export const uartExtender = new Max14830(uartExtenderI2C);
export const modemUart = uartExtender.uart(3);
export const modem = new UBloxModem(modemUart, 'LISA-U200');
export const debugUart = mcu.uart(2);
export const debugConsole = new UartConsole(debugUart, { baud: 115200 });

Then we also need some glue code:

// main.js
import * as device from './device';
import { run } from './app';

// The console we want to use for output messages
global.Console = device.debugConsole;

// The device to use for connecting to the internet
global.internet = device.modem.internet();

// Transition from compile time to runtime
mcu.start();

run();

Perhaps the reality won’t be so easy, and I’m oversimplifying it. But I can imagine getting a library like this off npm and being able to get working with it on a firmware device within a few a hours.

How long would it take to do the same thing in embedded C? Days, weeks, months?

Conclusion

In between writing this article and publishing it, I ran into another real-world example. I needed a modbus connection from my C firmware to my electron JavaScript application, and I was implementing both sides. The JavaScript side was working within an hour, as one would expect. The C side took days of implementing hundreds of lines of porting layer, managing states, and banging my head against the wall.

The reality is that JavaScript is simply a much more productive tool to use, and a large part of that is because of how easy it is to reuse third-party code and to share your own so that others can reuse. 


  1. Although if you want to use it for a real project these days, please use TypeScript so you can get static type checking 

Can you parse this?
JavaScript Corners

Can you parse this?
JavaScript Corners

What does the following JavaScript mean:

const x = await / +y; const z = await / +y;

Hint: it’s a trick question.

The answer depends on the context, as is demonstrated by the following snippet:

function foo() {
  const y = 10;
  const await = 5;
  const x = await / +y; const z = await / +y;
  console.log(x);
}
async function bar() {
  const y = 10;
  const x = await / +y; const z = await / +y;
  console.log(x);
}
foo(); // Prints 0.5
bar(); // Prints / +y; const z = await /10

Within the context of an async function, await is like a keyword, and the thing after await is considered to be an expression. In JavaScript, an expression that starts with forward-slash is a Regexp literal, and that literal ends with the next unescaped forward slash. The +y at the end then represents string concatenation, so both the regular expression and y are converted to strings, and the concatenated result string is "/ +y; const z = await /10".

This interpretation is easier to visualize if the syntax highlighting identifies and colorizes the respective parse tokens as follows:

Outside of the context of an async function, await is just a normal identifier and has no special meaning (this is important so that the introduction of the await syntax to the JavaScript language didn’t modify the meaning of existing JavaScript code which might have used await as a variable or parameter name).

If syntax highlighting was correct, as seen in the above images, the difference would be pretty obvious. Unfortunately, I needed to photo-shop the above images, since VS Code highlights both examples the same, and both incorrect:

MetalScript Progress Update – May 2019

MetalScript Progress Update – May 2019

It’s been a while since my last update! Unfortunately, this is because there’s not a lot of observable progress to speak of. But today, here’s another update for those who are interested. I’ll preface this by saying that this update is not particularly interesting.

TL;DR: my personal life has got in the way, and MetalScript has been more difficult than I anticipated.

Personal Life

Since I’m the only person working on MetalScript, and I’m working on it in my spare time, the progress on MetalScript is intrinsically dependent on the state of my personal life. So here’s the status on that. (Or skip to the technical section).

Firstly, I’m going through a divorce. It’s been emotionally taxing and takes time and money (and money = time, and emotional strain = loss of time).

Secondly, I left my traditional job in January. I’m still doing contract work, to lessen the blow on my savings accounts, but overall, working for money is a lesser part of my life now (at least until my savings run out). In theory, this will give me more time to work on MetalScript!

Thirdly, I’ve moved house and country. Kinda. I’ve done half of the move: I’ve moved out of my previous apartment in January, but I haven’t moved into anywhere else yet1. For the last 5 months or so, and for the foreseeable future, I’ve been living a nomadic life, jumping between staying with family and friends and Airbnbs on different continents2. I carry in my suitcases the possessions I really need, like, umm, my laptop, monitor, mechanical keyboard, Homepod, headphones, and VR headset3. Oh, and clothes and stuff. Most of my other possessions have been sold or donated to charity4 (or given away in the divorce).

Technical

My last progress update5 was in October. At that time, I had a working demo showing a small snippet of code compiling (see pages 22-26 of the presentation I made available in my October post). The summary of progress since then is that now MetalScript cannot compile anything at all, as I’m rewriting large parts of the symbolic interpreter. Most of the time I’ve spent on the project over the last 6 months has been in refactoring and progressing on the design concepts.

Limitations of the previous work

Given that it was a work-in-progress in October, there were some limitations to the kind of code that could be compiled. One such limitation is the fact that it only works when the output does not have multiple runtime functions — the symbolic interpreter did not yet have a way of passing runtime data between different runtime scopes (function scopes).

A related limitation is that the symbolic interpreter could not compile a program that required runtime heap allocations.

These are the challenges that I’ve been solving recently and will continue to do so over the next few months.

Design Work

The vast majority of my time on MetalScript over the last 6 months has been spent on design work (or what I call “thought-work”).

My design process is to journal extensively, to hash out different approaches and refine unclear concepts. In the MetalScript project, I’ve accumulated over a million characters across all the markdown files that make up my journalling history.

I used to be able to write code directly from the ideas in my head. Using a fuzzy language like English (or pseudocode, IMO) for describing behavior is for people who have fuzzy thoughts6. If your ideas are clear in your mind, then expressing them directly in code is trivial7.

With MetalScript, and other projects where I push the boundaries on my capabilities, I find that I’m less and less able to write code directly. A lot of the ideas I conceive are so foreign and vague to me, just an abstract notion or gut feeling, that I can’t even express them well in English, let alone in code. Hence why I write pages and pages of journaling (mostly English prose), often rehashing the same idea 10 or 20 times over the course of weeks or months, each time making it slightly clearer, until it reaches the clarity required to express it in code.

The thought-work I’m working through at the moment is related to the symbolic interpreter, and how it handles ambiguity in pointer values.

Moving to POD Types

Previously, a lot of the internal data models of the compiler used direct cross-references in their in-memory representation. For example, a jump instruction may contain a direct reference to the block to which it jumps, or an add operation may contain a direct reference to the operands.

While this is convenient when either looking up the reference (e.g. performing the jump) or in constructing the reference, it makes some things more difficult when it comes to ease of maintenance/programmability:

  • Data in this model cannot simply be dumped to the console or a file as a JSON, because JSON embeds the structures that are referenced, which may even be recursive.
  • Cloning structures with internal or cyclic references is more difficult.
  • When debugging, it’s a lot harder to visualize what’s going on without human-readable IDs (direct references have an implicit machine-readable ID which we call the “address”).
  • Related to debugging, if you need to save IL or VM state as a file (for debugging or just persistence), if there are internal references then you need to calculate ephemeral names for everything on the fly, in order to represent the cross-references in a persistable way.

Given the phase of the project, and the need for simplicity and debugability over performance, I decided to overhaul the data structures to be pure POD-types that don’t include internal cross-references. In this model, all cross-references are implemented as string or numeric IDs (depending on the context).

Simplified IL Model

The symbolic interpreter is a complicated beast, and to give it the best chance of success, it helps if the IL that it processes is as easy to interpret as possible. Earlier this year, I undertook a large sub-project to redesign and simplify the IL model (the set of IL operations and their semantics). The new IL model is able to express JavaScript source code in a form that is much easier to interpret, which makes downstream phases of the compiler simpler. Simplicity is critical when working on such a large and complicated project.

A change in my personal work process

For me, the thought-work (iterative journaling) is incredibly taxing, mentally and emotionally, due to its complexity, inherent uncertainty and lack of clear progress indicators. I find I burn out after a few hours in a day of doing such work.

The solution I’ve adopted recently is to split my work day: I spend a few hours on concept work, followed by the rest of the day on what I would characterize as easy grunt work, such as implementing the ECMAScript specification. For example, yesterday in my “easy time” I implemented object destructuring, so now the following JavaScript is compiling to IL:

const { foo } = require('./module');

For interest’s sake, this is what the IL looks like (the highlighted lines correspond to the destructuring, and the preceding lines are the function call to require):

    t0_0 = sys_op("resolveBinding", "require", false, undefined);
    t0_1 = sys_op("getValue", t0_0);
    t0_2 = sys_op("getBase", t0_0);
    t0_3 = sys_op("withBaseObject", t0_2);
    t0_4 = list_new();
    t0_5 = copy("./module");
    t0_6 = sys_op("getValue", t0_5);
    t0_7 : list_push(t0_4, t0_6);
    t0_8 = sys_op("throwIfNotCallable", t0_1);
    t0_9 = sys_op("call", t0_1, t0_3, t0_4);
    t0_10 = sys_op("getValue", t0_9);
    t0_11 = reg_get("context");
    t0_12 = record_get(t0_11, "lexicalEnvironment");
    t0_13 = sys_op("requireObjectCoercible", t0_10);
    t0_14 = list_new();
    t0_15 = sys_op("resolveBinding", "foo", false, t0_12);
    t0_16 = sys_op("getV", t0_10, "foo");
    t0_17 = sys_op("initializeReferencedBinding", t0_15, t0_16);
    t0_18 : list_push(t0_14, "foo");

This work methodology is going well for me so far, because it provides a balance between the sensation of tangible progress and the more mentally-taxing job of getting the fundamental principles in place.

What’s next?

  • I’m working on a module loader for CommonJS modules, so I can get some basic multi-module code running. There are some interesting challenges here. For example, the imported module is allowed to call MCU.start in its root scope, thus suspending the process mid-require.
  • I need to finalize the new design for the symbolic interpreter and get it working

  1. The world is surprisingly ill-adapted to people who have an email address but not a physical address. 

  2. I’ve even experienced an airport shower for the first time. I was surprised at how unpleasant it wasn’t — there was hot water, and space to put my luggage. But I did empty half a bottle of hand sanitizer on the floor before getting in because waterproof sandals are not something I own at the moment.  

  3. Actually, I only use the VR headset on flights — it’s great to disconnect from the real world and watch Netflix in a massive home theater in the virtual mountain-tops 

  4. “Bare-metal living”, you might say. 

  5. That is, my last published progress update. I’ve written several updates in the interim which I never published in the end because I felt they didn’t say anything substantial. 

  6. Or for those who can’t natively think in terms of the programming language and thus have to mentally translate between their primary (natural) language and their “secondary” (programming) language 

  7. Technical documentation as a form of inter-human communication is a different story. 

Hughes List

Hughes List

Inspired by Eric Lippert’s recent post on a data structure called a Hughes list, I thought I’d play around with it by writing my own equivalent in JavaScript.

Side note before I get started. For my usual readers who are following the progress of my JavaScript-to-native compiler (MetalScript), I’ll hopefully continue blogging about it soon. I’ve just been in the middle of an international move, along with some other projects, and haven’t had much time to work on MetalScript. My flight out of the country leaves in a few days, and then hopefully my schedule will open up and I’ll continue on it. For those who don’t know what I’m talking about but want to know more, also check out my last post where I link to a presentation on what MetalScript is, or any other posts in the category.

Getting back to the point of this post…

The key benefit I see about Hughes list structure is that it is a persistent (immutable) list that allows you to concatenate, prepend, and append to the list in O(1) time. It does so by lazily accumulating the operations to be performed until the list contents are actually needed, when it then finally performs a single O(n) pass to build the final “real” list (at least this is my interpretation of how it works).

For those who want to cut right to the chase, see this GitHub repo for the final code. It’s not a lot of code. test.mjs is the entry point, and hughes-list.mjs is the implementation of the data structure.

I suggest reading Eric’s series first for a full explanation of the underlying principles before comparing it to mine. I’m not here to explain it (which Eric has done perfectly) but rather to provide a twist and an implementation in a different programming language.

I deviated quite a lot from Eric’s implementation, in the following ways:

  • My Hughes list is built up by composing imperative procedures that each mutate a JavaScript array, rather than working with an underlying immutable SimpleList like Eric uses. This produces the same logically-immutable list structure and achieves similar performance characteristics but saves on implementing an additional SimpleList type.
  • I omitted the wrapper class. If you inspect the list in a debugger, the list will actually be the function, rather than containing the function. I wouldn’t do this in production code but it leads to a nice simple implementation here.
  • As a superficial detail, I used the traditional JavaScript function names push and unshift instead of append and push respectively.

Before jumping into the detail of the implementation, it’s probably best to take a look at how this list structure is used. These are the unit tests I have for it (see the full file for more context).

checkList(HL.empty                               , []);
checkList(HL.push(list1, 4)                      , [1, 2, 3,   4]);
checkList(HL.concat(list1, list2)                , [1, 2, 3,   4, 5, 6]);
checkList(HL.unshift(list1, 0)                   , [0,   1, 2, 3]);
checkList(HL.concat(HL.push(list1, 10), list2)   , [1, 2, 3,   10,   4, 5, 6]);
checkList(HL.concat(HL.unshift(list1, 10), list2), [10,   1, 2, 3,   4, 5, 6]);
checkList(HL.concat(list1, HL.push(list2, 10))   , [1, 2, 3,   4, 5, 6,   10]);
checkList(HL.concat(list1, HL.unshift(list2, 10)), [1, 2, 3,   10,   4, 5, 6]);

These should all look pretty obvious. If you’re not familiar with persistent data structures, the most important thing to point out here is that a call like HL.unshift(list1, 0) does not change list1, but rather returns a new list that is the like list1 but with 0 at the front. This is also good for me to highlight because I said that my implementation of Hughes list composes imperative procedures that mutate the underlying JavaScript array – so it’s worth getting your head around how this immutable list structure is implemented on top of array mutations.

My implementation of the Hughes list data structure comes out to something like the following:

export const empty = () => {};
export const single = x => xs => xs.push(x);
export const concat = (ls1, ls2) => xs => { ls1(xs); ls2(xs) };
export const push = (ls, x) => xs => { ls(xs); xs.push(x) };
export const unshift = (ls, x) => xs => { xs.push(x); ls(xs) };

The things to highlight above are:

  1. The only operation used on JavaScript arrays here is push, even in the implementation of unshift. This is the key benefit of the Hughes list. Depending on the implementation of arrays in the JavaScript engine, the native unshift on arrays may be O(n) (on an array of length n), while push is typically O(1).
  2. As previously noted, this implementation performs mutations, so rather than ls2(ls1(xs)), we do ls1(xs); ls2(xs). The former might have created a new array/list from ls1 that is passed to ls2, while the latter simply invokes both mutations one after the other on the same mutable array.
  3. You’ll notice that the implementation is backwards from Eric’s implementation, in that the definition of a list is a function that appends its items to the end of a given array, not the beginning. This again is just because appending an item to the end of an array in JavaScript is the most efficient.

I personally think this implementation still captures the essence of what makes the Hughes list structure useful, while also interplaying well with the king of JavaScript list-like structures, the array. The key effect is that it allows us to concatenate, prepend, or append items in any order with almost no cost, and let the list methods build the compound procedure such that the operations are performed in the optimal order for performance.

TC53 Meeting

TC53 Meeting

Those following my blog will know that I was in Boston last week for ECMA TC53.

ECMA is a standards group, and TC53 is the 53rd Technical Committee, being established to produce standards for JavaScript on wearables and other resource-constrained devices. Some of you may already be familiar with TC39, who are the creators of modern JavaScript as a standard, of which TC53 is in some sense an offshoot. This meeting in Boston was the inaugural meeting, and I was invited as a guest to participate and to present on MetalScript.

Thanks to Lizzie from Moddable for taking photos.

I’m sitting on the end of the table because the meeting started at 10:00 AM and I arrived at 9:59 AM, while everyone else arrived closer to when they were told to arrive (9:30 AM)1. I hurried into the room with everyone already sitting and no space for me until everyone shuffled up. Great start!

The MetalScript presentation went well, aside from some technical glitches that were out of my control. For anyone who wants to see it, here it is:

(Or here’s a PDF that includes the notes for each slide)

If you look at the presentation, you may notice that I actually have some bare bones code compiling, which I managed to get working just the week before the event. This is an amazing milestone, albeit somewhat hacked together, but I’ll talk more about this in an upcoming post.

In addition to the presentations and discussions, there were also a number of demos. Here’s a representative from Johnny Five giving a demonstration using Tessel 2 (there were also other demonstrations, but I’m including this one because it’s the only photo where I look like I’m paying attention).

Another notable player in the room was Moddable with their XS JavaScript virtual machine for embedded devices, and some really cool demos using JavaScript in home automation, rendering fluidly to graphics displays, and the ability to download “apps” to an embedded device by partially flashing it.

It was amazing to be in a room full of people who are actually in the same technical space as what I’m doing — in the general population of programmers, there isn’t a very high proportion of people in the JavaScript firmware space!

Of course, spending 50 hours in an airplane for 14 hours of meeting is no good unless I also go “touristing” and check out some of the sites in Boston. Take a look at Facebook to see more of that side2. I’m not interested in arts and history, and I spent most of my time after TC53 sleeping, so the photos are sparse. But the squirrels are cute!

 


  1. To be fair, I fully intended to be there at around 9:45, but misjudged the ridiculous line at Starbucks by my Airbnb and the traffic getting to the location downtown. Also I was told conflicting start times, and only realized 9:30 AM was one them about 5 minutes before arrived, while trying to find information about the venue. Excuses, excuses… 

  2. I haven’t posted on Facebook in years, but I’m thinking maybe I should get back into it 

MetalScript Progress — September 2018

MetalScript Progress — September 2018

A MetalScript progress update, for those who are interested. This one is all boring technical details — if this is your first time on the site, perhaps read something more interesting like what MetalScript is and the reason why I’m creating it. Or for those who like technical details, read on…

[Edit: Previously this had “November 2018” in the title, even though I wrote this in September — who knows what was going through my head when I wrote that.]

Stack Unwinding

To recap, last time I said I had my IL compiler compiling the following code file to IL and getting it to run in the virtual machine:

// blinky.js
MCU.start();
setInterval1

Phew! That’s quite a list. I must point out though, that this isn’t even comprehensive. Initializing the realm is a lot more work than implied by the little IHDR rectangle in diagram. Similarly for most other operations — I’ve only shown the highlights here.

The main point I want to convey is that even though MCU.start() appears to be the first thing executed when you look at the source code, there is actually a lot of VM work leading up to that point, and by the time the VM is suspended at MCU.start(), the VM stack is already 5 frames deep. It is at this point that the MCU hardware must take over — the entry point for the MCU reset interrupt vector is the continuation of the MCU.start() call2. But ideally we do not want to somehow translate the VM stack into the corresponding hardware stack at the point where the program starts. We could, but it would be significantly more difficult to compile this way.

Instead, the “unwinding” process that I’ve finished now translates the functions in the call stack to continuation passing style (https://en.wikipedia.org/wiki/Continuation-passing_style). These constructed continuation functions are just normal IL, whose behavior starts where the original function left off.

This isn’t as easy as it sounds. It has to work for all valid user code, including user code that might look like this:

function myFunc() {
  const thingsToDo = [];
  thingsToDo.push3;
  // (... add more things to do ...)
  while (thingsToDo.length) {
    const thingToDo = thingsToDo.shift();
    thingToDo();
  }
}
myFunc();

The challenge in the above code is that the continuation has to continue in the middle of the loop. It has to restore the state correctly, while resuming the old behavior on the second iteration of the loop (the first iteration is split between compile time and runtime, and the second occurs completely at runtime).

Symbolic Interpreter

The phase I’m working on now is what I call information threading which is done by something I call a symbolic interpreter. This is the part where the compiler tries to understand the code that’s been written, in the same way that a human might mentally follow through the code.

For example, in the blinky example, a human following through the code will know what the call to setInterval refers to — they know the state of the machine, they know that it contains a setInterval JavaScript function, and they know whether or not they’ve done something nasty to that function such as replacing it with their own wrapper that does something different to the standard setInterval function. So a human following the code would be able to predict with 100% accuracy what the code should do (and so can a machine).

But there are cases where you can’t predict the behavior exactly. For example, calling toggle. You can predict what function is called when you invoke toggle, but you don’t say whether it will turn the LED on or off — at different times the same line of code can have different effects, depending on previous runtime state. This just means that you need to record that state somewhere at runtime.

The symbolic interpreter exactly this. It keeps track of the state of the program over “time”, keeping track of what can and can’t be known at each point in the program. This information will later be used to emit an implementation of the source code in the target language (in this case LLVM IR).

I have a lot of the symbolic interpreter up and running. It’s currently able to fully interpret simple IO-free programs, meaning that if it’s given a script that doesn’t perform any IO (e.g. one that doesn’t read or write to GPIO for example), it can infer the behavior of the program completely.

But in the case of IO-free programs, there’s no observable behavior at all (observable behavior requires IO). Interesting things can only happen when you hit an IO boundary, such as turning on an LED, sending commands to a robotic arm, or writing “Hello, World!” to the console. So this is what I’m working on at the moment — IO boundaries for the symbolic interpreter.

What’s Left?

I’m currently working through the “Blinky” example. I want to get the blinky code going all the way through the compilation pipeline, as in the following diagram (blocks in green are complete).

I don’t think the type assignment phase will be difficult. Taking the output from the symbolic interpreter, it should be easy to have an algorithm choose some basic binary types. Likewise, I’m optimistic that the LLVM IR emitter will be easy, given the chosen binary types. I’m not referring to the complete implementations for any of these phases. I only need to reach a proof of concept for the blinky example first. This will give me a chance to go back and review the good and bad decisions I’ve made, before fleshing out each phase to fully support all cases.

There is a little bit of work not implied by the diagram above, such as leveraging LLVM to produce output object files, linking, and developing a minimal runtime (event loop and GC). But I think these will be relatively straight forward for the first iteration of POC.

The above work is all to get to a minimal POC where I can demonstrate end-to-end compilation of at least one input file. Then as I’ve said in previous posts, it will be somewhat longer to reach MVP.

News: TC53

I’m very pleased to say that I’m traveling to Boston4 next month to participate in the inaugural meeting of ECMA TC53, a new “technical committee” (TC) being established for JavaScript on “smart wearable systems and sensor-based devices”. I’ve been invited to speak a bit about MetalScript, among other things, and so I’m working hard to reach a good milestone before the meeting. I’m incredibly excited to go and be part of what will certainly be a group of highly intelligent people paving the way forward for a better future5.

 

 


  1. ) => gpio.pin('B4').toggle(), 500);

I said that next up was the “unwinding” phase — taking the suspended virtual machine state and unwinding the virtual stack and register states to create a unit of code that doesn’t require any stack or register information. I’m glad to say, this is complete!

Let me explain how it works. At the point where MCU.start() is invoked, there’s actually a bunch of things in the VM stack, as illustrated by the following diagram:

 

Most of this, unsurprisingly, is not user code. I’ve used blue to represent system functions (aka ECMAScript “abstract operations”), and red to represent user functions. Here’s a brief explanation of some of the system functions:

  • RunJobs – this is the entry point to an ECMA application and has the main event loop that we all know and love.
  • IHDR (InitializeHostDefinedRealm) – this creates all the “realm” objects, which includes things like the “Array” object and the “Object” object — things that are “built in”
  • EnqueueJob – RunJobs gets the JS source text and calls EnqueueJob to add a script evaluation job to the job queue (for each entry JavaScript file, although in most applications I’ve seen there’s only one such file, normally called “app.js” or something).
  • ScriptEvaluationJob – this is a job wrapper in the spec that is used to evaluate a previously-enqueued user script
  • Parse (ParseScript) – this parses the script and returns information required to evaluate it.
  • ScriptEvaluation – given information from the parsing stage, ScriptEvaluation sets up the context needed to run the main script code, and then runs it.
  • GDI (GlobalDeclarationInstantiation) — initializes variables and functions for the script (recall that JavaScript variables and functions are hoisted)
  • ScriptBody – then we finally get to executing the body of the script — the actual user code.
  • Call – The user code invokes MCU.start(), which is a JavaScript function call, and there are a number of system-level things that need to happen for calls, such as checking that the target is callable, deciding what this-value to use, etc. All this work is encapsulated in the “Call” system function ((There is a separation in the MetalScript compiler between IL functions and JS functions, and likewise between IL function calls and JS function calls. IL does not have or need closures, variadic functions, etc, and is used to represent both user code and ECMAScript “abstract operations”. 

  • It is exactly the continuation of the MCU.start() call. Unlike in C, there are no operations on the MCU that precede the user code. Operations like initializing the MCU clock source or copying .data into RAM are logically part of MCU.start() and can be parameterized and configured accordingly in future by options passed to the MCU.start() call. I’ve said this before, but I feel it’s important enough to mention again 

  • ) => MCU.start( 

  • Boston is 17,000 km from here in Melbourne — almost as far away as is possible without going into outer space. I’m going to be spending about 3 times longer on the airplane than in the meetings! 

  • In a dream I had about it a few nights ago, I could swear that Gandalf was one of the members of the committee. 

MetalScript Progress – July 2018

MetalScript Progress – July 2018

For those following along, here’s an update on my progress on MetalScript in the last month.

It’s been a bit of a slow month for MetalScript, just because I have a lot going on and MetalScript takes a bit of a backseat to other important things. I expect things to stay like this for a while longer.

Website

I spent a weekend hacking together a stub of a website. See http://metalscript.com. It’s really a placeholder for a website, not a real website. I spent a morning giving myself a crash course in Hugo and Slate for the first time, as well as trying to remember how to use Blender to create the somewhat rudimentary graphic depicting “JavaScript on a Microcontroller”. And lots of CSS hacking.

The website is not meant to attract customers, because of course there is no product yet. It’s meant to be a representation of the grand vision of MetalScript. It paints the picture of what I’m trying to get to, and so helps me stay focused on the goal. It’s also a place where I can start collecting user documentation so that when I launch the MVP there will already be documentation in place, and the documentation doubles as a spec of what it needs to do.

I didn’t spend much time on it, and it leaves lots to be desired. The markdown at the bottom of the home page is particularly jarring, and the whole thing is not very mobile friendly. But I’ll improve it over time.

I also couldn’t make up my mind on the color scheme, and landed up changing it a number of times. Originally I was thinking pink/orange (excuse my primitive “engineer’s vocabulary” for colors) because I think it has really awesome energy to it, and paints the picture of something new and exciting and different…

But when I was playing around with the MCU graphic, I started with yellow instead because it seemed to be “the JavaScript color” (if you Google images for JavaScript).

But quite frankly, yellow is ugly. Sorry to those who like it. It’s such a stark, flat color, and doesn’t say anything I want it to for MetalScript.

I went through a number of greenish shades which I dismissed. And while it’s probably not final, the website is currently based around an aqua blue.

Not everything matches, because I kept changing my mind about exactly where on the spectrum between blue and green I wanted to be.

Blue is nice because it’s calming but professional. This is very much what I intend MetalScript to be — a tool for professionals, but one without the discomfort and complexity associated with C and C++.

I also spent embarrassingly long on the front image. I was trying to convey the idea of putting JavaScript “onto” or “into” a microcontroller, but was worried about misrepresenting what I am doing as if I am selling a microcontroller with a JS brand (I can’t assume that anyone visiting my website has prior expectations about what MetalScript is). My solution in the end was to create a somewhat unrealistic and stylized representation of an MCU, to try help get that idea across that the MCU in the picture is not a real thing I’m trying to sell.

Just because it’s unrealistic and stylized doesn’t mean it’s not a challenge for a Blender amateur like me. There are lots of details I tried to get right that a professional may have achieved better results in a lot less time. Take for example this comparison between an earlier draft (left) and a later one (right)…

(Click for a larger view). I wanted to convey the idea of JavaScript “radiating” out of the microcontroller, as if the touch of MetalScript imbued it with superpowers. In the image on the left, the JS logo is actually radiating, but radiation (light emission) is pretty boring-looking in reality. On the right, I decided to just “pretend” to radiate by actually constructing emissive volumetric rays floating above the “JS”. This was much harder than I thought, such as trying to provide an alpha (transparency) gradient from the bottom to the top so that the rays appear to just “fade out” gracefully as they move further from the apparent source, and not abruptly end when the volume ends. Half the time while doing this, I landed up having anti-emisive rays, if that’s a real term, that sucked in blue light (looking red-ish) — a side effect of not providing the correct origin point or 3D orientation for the gradient volumetric texture.

(Other differences to note are the corners, the color, and the reflectivity of the surface).

By no means is anything final. I will likely change my mind several more times before MVP, and anyway, as soon as there is money I will probably get this all looked at by someone with better skills than mine.

Technical

But to the technical stuff I’ve been doing in the last month…

IL Compiler

I’ve added the necessary features to my IL compiler (compiles JS to IL) to support the following traditional example:

// blinky.js
MCU.start();
setInterval(() => gpio.pin('B4').toggle(), 500);

This is the hello-world of microcontrollers, to create a blinking LED. Of course, nothing yet blinks since there is a lot of work left to do on the compiler before I have actual machine code running on the device. But nevertheless it demonstrates a number of key features that the IL compiler needs, such as the ability to instantiate closures, call functions, access properties, etc.

Virtual Machine

Probably the biggest piece of work done in the last month is getting the virtual machine up and running.

It’s been tested printing “Hello, world!”, which is an important milestone. You might think that “Hello, world!” is a stupidly simple example, but actually the virtual machine needs to execute hundreds of IL instructions to achieve that simple task, most of which is setting up the realm (i.e. the global builtin “stuff” that JavaScript needs).

Perhaps more interestingly, I have the above blinky example running on the virtual machine. I say “more interesting” because the blinky example demonstrates another key feature: suspending the virtual machine before the next stage of compilation. So what I have currently working is taking a program (blinky) that runs in the VM until the suspension point, and then the suspended machine is serialized as IL, including registers, stack, and heap allocations (the virtual heap includes functions). That’s a pretty cool point to be at!

Re-reading this, I think I should clarify what I mean by “virtual machine”. I’ve previously made it clear that MetalScript programs don’t run on a virtual machine, so why is there a virtual machine? The reason is simply because in the two-phase execution model of MetalScript, where a program is executed at both compile time and runtime — it is the compile-time execution that is done on a virtual machine. The runtime execution is bare-metal.

What’s next?

The next stage is a post-processing step that will be executed on the suspended virtual machine state, that implements the behavior of “resuming” the JavaScript job that was suspended when MCU.start was called. This post-processing step is somewhat like a continuation passing style transformation. All the functions that were suspended in the call stack will be turned “inside out” so that their entry point is where they left off when the virtual machine was suspended.

Believe it or not, the VM stack is 5 levels deep at the point where the blinky example is suspended, even though the call is done from the root scope of the script. This is yet another illustration of the fact that “it’s more complicated than it looks”.

The result of this step will be a body of IL that does not need to contain any definition for the current register states or call stack, which will be much easier for the next phase of the compiler to deal with.

I’m also finalizing the design of what I call the “symbolic interpreter” which essentially will step through the IL to figure out what it does (and thus how to implement it in machine code). This is the most complicated and critical piece of the whole project. As I’ve said in previous posts, through the side experiment I called “MiniLanguage” I’ve gone far enough with this idea to feel confident of the direction I’m going, but MetalScript is an order of magnitude more complicated than MiniLanguage so it’s going to take some time to iron out the details.

I’ve also spent some time thinking about how setInterval is going to work. There are any number of possible approaches I could take, but I’ve decided that the quickest path for the moment will be to implement it as “special” behavior that is part of the bare-minimum runtime library. Previously I said that the runtime library would only have the GC and event loop, but now I think that timers are also going to benefit from being built-in in this way. Among other things, this means I can postpone thoughts about implementing interrupts in pure JavaScript (which is completely plausible, but I don’t want to bloat the MVP with features that are not critical to using MetalScript, when there are perfectly reasonable pragmatic alternatives).

Looking further into the future, my plan is to keep pushing the blinky example through each phase of the compiler until I eventually get it out the other end and have an actual blinking LED. This would be a point of massive celebration.

Progress on MetalScript and random thoughts

Progress on MetalScript and random thoughts

This has been a good week for MetalScript.

This post is more of an informal ramble. Read it if you extra time on your hands, but I don’t think I’m going to say anything profound here. Also read this if you’re doing a product competing with MetalScript, since I’m spilling some of the implementation here ;-)

A few weeks or months ago1, I switched over from MiniLanuage to MetalScript. MiniLanguage is a new language I started about 9 months ago to test-drive a number of the MetalScript principles in a simplified context. It’s in MiniLanguage that I’ve created a working end-to-end pipeline from source text to output binary. But rather than perfect MiniLanguage with all the bells and whistles, I wanted to move back to the real project.

Breakthrough — bridging compile time and runtime

I had a breakthrough this week in terms of structuring code that implements the spec in a way that splits between runtime and compile time. In particular, the ParseScript operation (and some of the surrounding code) has been a bit of a pain in the butt — the behavior of the function according to the spec is to return a Script Record, which contains both a reference to the realm2, and also a reference to the script code.

The script code is purely a compile-time construct, while the realm is purely a runtime construct. So ParseScript is kinda split between runtime and compile time. It’s really awkward to consolidate code that is split in this way, while still trying to make it maintainable. And I’ve been bashing my head against a wall for a while on the right way to do it.

Something finally clicked this week, and I found a way to have both runtime behavior and compile time behavior defined in the same lines of code. The way it works is essentially that my implementation of ParseScript returns a monad, that contains both the compile time component of the return value, as well as the sequence of runtime IL operations required to get the runtime component of the return value. The caller can immediately use the compile time component, and then is obliged to also emit code that invokes the runtime component.

For simplicity, I opted to implement the monad as just a tuple, as described by the MixedPhaseResult type below.

/** Used as the result of an operation that has both a compile time and runtime
 * component. The compile time component of the operation is the function
 * itself, and its result is the first element in the returned tuple. The
 * runtime component is represented as an IL function that is the second part of
 * the tuple. */
type MixedPhaseResult<T> = [T, il.ILFunction];

...

// https://tc39.github.io/ecma262/#sec-parse-script
function parseScript(unit: il.Unit, sourceText: string): MixedPhaseResult<ParseScriptResult> {
  ...
}

...
  const [parseResult, parseScriptRT] = parseScript(unit, sourceText);
  const scriptRecord = code.op('il_call', parseScriptRT, realm, hostDefined);

(Side note: Yes, of course MetalScript is being written in TypeScript, and the plan is to make it self-hosting so I can compile MetalScript with MetalScript in order to distribute an efficient binary executable of the compiler).

The beauty of this approach is that it allows me to have a single parseScript function in my implementation, which as you can see above has an embedded comment that references the exact location in the spec, and the full behavior of the corresponding piece of the spec is fully encapsulated in the body of parseScript. This one-to-one relationship between spec functions and implementation functions is going to be super useful from a maintenance perspective — keeping up to date with the latest spec — which as stated in a previous post matches one of my goals with MetalScript.

I’ve used this technique in a number of other places that bridge the gap between compile time and runtime, and I think the result is beautiful.

MetalScript Unit Tests

I’ve also started writing unit tests for MetalScript. Previously I avoided unit tests because there was too much uncertainty and I landed up completely changing my mind on things too often to make unit tests a useful addition. Now after spending 9 months in MiniLanguage, I feel I’m reaching a point of stability in the underlying concepts and have started adding unit tests.

The first unit tests I have working use the above parseScript and related functions to translate a source text to IL. And as of today I have this working for 2 simple test cases — an empty script and a “Hello, World” script.

I can’t show you the IL itself, because it would give away too many of the internals. Maybe when I’m further along in the project and there is less risk of having my ideas stolen, I will give up more details. But believe me when I say the IL is a thing of beauty.

The hello-world script translates to 62 lines of IL (including some whitespace), which is a lot, and emphasizes how many operations are actually required to perform simple tasks in JavaScript, and how much of an accomplishment it is to get to this point. Bear in mind that this IL language is designed by me with the intention of compiling easily, not to be a compact representation of the program, since the IL will never get to the target device.

Personal Note: Be comfortable with your work

A personal lesson I’m learning with this and other projects, is to do what it takes to feel comfortable with what you’ve done. In MetalScript, it’s a constant battle in my mind as to whether I should cut corners to save time and get to a POC quickly, or whether I should take it slow and make sure that every piece is as simple, understandable, reliable, and maintainable as possible.

There are arguments for both at different stages of a project, but if you plan on the project becoming something big, then I really believe you need to do what it takes to feel emotionally comfortable with what you’ve done. The reason is that when you leave a piece of code, and in the back of your mind you think of it as hacky, fragile, or overly complicated, and when you wrote it you just had to pray that it worked, then you aren’t going to want to go back to it, and you will become generally demotivated by your work. But if you leave a project or piece of code feeling comfortable about it, then it will be much easier to go back “home” to it in future.

So when I say that you should spend time on your work until you feel comfortable with it, I’m not talking about spending time making the most advanced piece of code you can be proud of, that has a gazillian features and can do backflips and handstands and handle a bunch of different use cases. I’m talking about thinking really carefully about how to remove complexity from your code and distill it down to its bare essence. You want to use your superpowers to remove complexity, not to handle it. Understanding a complicated design is only the first step; reducing it to a simple design is the end goal.

If your code is clean, simple, and has a good readme and guiding comments to help newcomers get into it, then you will feel more comfortable when you are the newcomer getting back into it after some time.

 


  1. Time is a blur 

  2. The realm is a collection of builtin objects such as Array and Object