Month: September 2018

MetalScript Progress — September 2018

MetalScript Progress — September 2018

A MetalScript progress update, for those who are interested. This one is all boring technical details — if this is your first time on the site, perhaps read something more interesting like what MetalScript is and the reason why I’m creating it. Or for those who like technical details, read on…

[Edit: Previously this had “November 2018” in the title, even though I wrote this in September — who knows what was going through my head when I wrote that.]

Stack Unwinding

To recap, last time I said I had my IL compiler compiling the following code file to IL and getting it to run in the virtual machine:

// blinky.js
MCU.start();
setInterval1

Phew! That’s quite a list. I must point out though, that this isn’t even comprehensive. Initializing the realm is a lot more work than implied by the little IHDR rectangle in diagram. Similarly for most other operations — I’ve only shown the highlights here.

The main point I want to convey is that even though MCU.start() appears to be the first thing executed when you look at the source code, there is actually a lot of VM work leading up to that point, and by the time the VM is suspended at MCU.start(), the VM stack is already 5 frames deep. It is at this point that the MCU hardware must take over — the entry point for the MCU reset interrupt vector is the continuation of the MCU.start() call2. But ideally we do not want to somehow translate the VM stack into the corresponding hardware stack at the point where the program starts. We could, but it would be significantly more difficult to compile this way.

Instead, the “unwinding” process that I’ve finished now translates the functions in the call stack to continuation passing style (https://en.wikipedia.org/wiki/Continuation-passing_style). These constructed continuation functions are just normal IL, whose behavior starts where the original function left off.

This isn’t as easy as it sounds. It has to work for all valid user code, including user code that might look like this:

function myFunc() {
  const thingsToDo = [];
  thingsToDo.push3;
  // (... add more things to do ...)
  while (thingsToDo.length) {
    const thingToDo = thingsToDo.shift();
    thingToDo();
  }
}
myFunc();

The challenge in the above code is that the continuation has to continue in the middle of the loop. It has to restore the state correctly, while resuming the old behavior on the second iteration of the loop (the first iteration is split between compile time and runtime, and the second occurs completely at runtime).

Symbolic Interpreter

The phase I’m working on now is what I call information threading which is done by something I call a symbolic interpreter. This is the part where the compiler tries to understand the code that’s been written, in the same way that a human might mentally follow through the code.

For example, in the blinky example, a human following through the code will know what the call to setInterval refers to — they know the state of the machine, they know that it contains a setInterval JavaScript function, and they know whether or not they’ve done something nasty to that function such as replacing it with their own wrapper that does something different to the standard setInterval function. So a human following the code would be able to predict with 100% accuracy what the code should do (and so can a machine).

But there are cases where you can’t predict the behavior exactly. For example, calling toggle. You can predict what function is called when you invoke toggle, but you don’t say whether it will turn the LED on or off — at different times the same line of code can have different effects, depending on previous runtime state. This just means that you need to record that state somewhere at runtime.

The symbolic interpreter exactly this. It keeps track of the state of the program over “time”, keeping track of what can and can’t be known at each point in the program. This information will later be used to emit an implementation of the source code in the target language (in this case LLVM IR).

I have a lot of the symbolic interpreter up and running. It’s currently able to fully interpret simple IO-free programs, meaning that if it’s given a script that doesn’t perform any IO (e.g. one that doesn’t read or write to GPIO for example), it can infer the behavior of the program completely.

But in the case of IO-free programs, there’s no observable behavior at all (observable behavior requires IO). Interesting things can only happen when you hit an IO boundary, such as turning on an LED, sending commands to a robotic arm, or writing “Hello, World!” to the console. So this is what I’m working on at the moment — IO boundaries for the symbolic interpreter.

What’s Left?

I’m currently working through the “Blinky” example. I want to get the blinky code going all the way through the compilation pipeline, as in the following diagram (blocks in green are complete).

I don’t think the type assignment phase will be difficult. Taking the output from the symbolic interpreter, it should be easy to have an algorithm choose some basic binary types. Likewise, I’m optimistic that the LLVM IR emitter will be easy, given the chosen binary types. I’m not referring to the complete implementations for any of these phases. I only need to reach a proof of concept for the blinky example first. This will give me a chance to go back and review the good and bad decisions I’ve made, before fleshing out each phase to fully support all cases.

There is a little bit of work not implied by the diagram above, such as leveraging LLVM to produce output object files, linking, and developing a minimal runtime (event loop and GC). But I think these will be relatively straight forward for the first iteration of POC.

The above work is all to get to a minimal POC where I can demonstrate end-to-end compilation of at least one input file. Then as I’ve said in previous posts, it will be somewhat longer to reach MVP.

News: TC53

I’m very pleased to say that I’m traveling to Boston4 next month to participate in the inaugural meeting of ECMA TC53, a new “technical committee” (TC) being established for JavaScript on “smart wearable systems and sensor-based devices”. I’ve been invited to speak a bit about MetalScript, among other things, and so I’m working hard to reach a good milestone before the meeting. I’m incredibly excited to go and be part of what will certainly be a group of highly intelligent people paving the way forward for a better future5.

 

 


  1. ) => gpio.pin('B4').toggle(), 500);

I said that next up was the “unwinding” phase — taking the suspended virtual machine state and unwinding the virtual stack and register states to create a unit of code that doesn’t require any stack or register information. I’m glad to say, this is complete!

Let me explain how it works. At the point where MCU.start() is invoked, there’s actually a bunch of things in the VM stack, as illustrated by the following diagram:

 

Most of this, unsurprisingly, is not user code. I’ve used blue to represent system functions (aka ECMAScript “abstract operations”), and red to represent user functions. Here’s a brief explanation of some of the system functions:

  • RunJobs – this is the entry point to an ECMA application and has the main event loop that we all know and love.
  • IHDR (InitializeHostDefinedRealm) – this creates all the “realm” objects, which includes things like the “Array” object and the “Object” object — things that are “built in”
  • EnqueueJob – RunJobs gets the JS source text and calls EnqueueJob to add a script evaluation job to the job queue (for each entry JavaScript file, although in most applications I’ve seen there’s only one such file, normally called “app.js” or something).
  • ScriptEvaluationJob – this is a job wrapper in the spec that is used to evaluate a previously-enqueued user script
  • Parse (ParseScript) – this parses the script and returns information required to evaluate it.
  • ScriptEvaluation – given information from the parsing stage, ScriptEvaluation sets up the context needed to run the main script code, and then runs it.
  • GDI (GlobalDeclarationInstantiation) — initializes variables and functions for the script (recall that JavaScript variables and functions are hoisted)
  • ScriptBody – then we finally get to executing the body of the script — the actual user code.
  • Call – The user code invokes MCU.start(), which is a JavaScript function call, and there are a number of system-level things that need to happen for calls, such as checking that the target is callable, deciding what this-value to use, etc. All this work is encapsulated in the “Call” system function ((There is a separation in the MetalScript compiler between IL functions and JS functions, and likewise between IL function calls and JS function calls. IL does not have or need closures, variadic functions, etc, and is used to represent both user code and ECMAScript “abstract operations”. 

  • It is exactly the continuation of the MCU.start() call. Unlike in C, there are no operations on the MCU that precede the user code. Operations like initializing the MCU clock source or copying .data into RAM are logically part of MCU.start() and can be parameterized and configured accordingly in future by options passed to the MCU.start() call. I’ve said this before, but I feel it’s important enough to mention again 

  • ) => MCU.start( 

  • Boston is 17,000 km from here in Melbourne — almost as far away as is possible without going into outer space. I’m going to be spending about 3 times longer on the airplane than in the meetings! 

  • In a dream I had about it a few nights ago, I could swear that Gandalf was one of the members of the committee.