Month: December 2020

Distributed apps in JavaScript

Distributed apps in JavaScript

This post is different from my usual topic of Microvium. I’ve been recently frustrated with the way that distributed applications are written and I’ve been brainstorming ways it could be improved. In my last post on the topic, I suggested that maybe a new programming language would be useful for solving the problem, but I ended the post thinking about how the Microvium snapshotting paradigm might also be a suitable solution. This time, I’m going to consider a solution that doesn’t require Microvium.

A quick summary of the objectives

I want the experience of developing a distributed application to be basically as seamless as that of developing a single-process desktop application, such as the ability to get type checking between services, to debug-step directly from one service to another during development, to write regression tests that encompass multiple services, etc. And I want this all with minimal boilerplate.

My last post on the topic goes into more detail about the issues I have with the current state of affairs and the improvements that could be made.

A proposed solution

Here’s the summary: I think the Microvium snapshotting paradigm is indeed the answer, and I think we can in a sense “polyfill” the snapshotting ability in a limited context by deterministically replaying IO.

I’ll talk about this solution a few parts:

  1. What is snapshotting and how could we use it to solve the problem?
  2. How can we implement snapshotting on a modern JavaScript engine?
  3. Is it necessary to do this JavaScript?
  4. More details

What is snapshotting and how does it help?

I’m using the term “snapshotting” here to mean the ability to “hibernate” a process to a file and then resume it later on the same or a different machine. This means capturing to a file the full state of the process, such as the global variables, the stack and heap state, and machine registers. And the ability to restore the full state in a completely new context.

Normally applications are deployed as either a compiled executable or as source code depending on whether the programming language is compiled or interpreted. Having the snapshotting capability gives us a third option: run the application code at build-time and deploy a snapshot of its final initialized state.

Doing it the latter way allows application code to have pre-runtime effects, such as defining infrastructure, and for pre-runtime code to pass information seamlessly to runtime code, such as secrets and connection strings for the aforementioned infrastructure.

When I talk about coding in JavaScript, I’m really talking about coding in TypeScript, and so another you get with this paradigm is type-checking across different infrastructural components and type checking between infrastructure and runtime code. There are other solutions like Pulumi that allow you to write IaC code in the same language as your application code, but to my knowledge, there are none that allow you to pass state directly from IaC code to runtime code.

How to implement snapshotting?

Microvium is a JavaScript engine that implements snapshotting at the engine level, but Microvium doesn’t yet implement much of the JavaScript spec and its virtual machines are size-limited to 64kB. It’s in no way practical for writing a distributed cloud application.

Node on the other hand is used for cloud software all the time, and I’m a huge fan of it, but it doesn’t support snapshotting in the way described.

But I think we can emulate the snapshotting feature on Node by having a membrane around the application that records all IO while it runs at build-time, and silently replays the IO history as an initialization step at runtime to recreate the final, initialized state. Although this will be slower than just recovering a direct dump of the memory state (if the engine had supported that), in theory, it should work.

This will only be reliable if JavaScript is deterministic (that its state and behavior deterministically follow from its incoming IO), which it almost is already. Work by Agoric on SES and Compartments makes this even more so, by allowing the creation of a sandboxed environment for JavaScript modules, in which non-deterministic JavaScript library functions can be removed by default (e.g. Math.random and new Date), or replaced with deterministic implementations. They do this because they run JavaScript on the blockchain, where a collection of redundant nodes needs to process the same script and reach the exact same conclusion about the new state of the blockchain — a strong determinism requirement.

Does it need to be JavaScript?

Honestly, I can’t think of a way to do this outside of JavaScript. JavaScript has a number of key features that I think are really useful for this solution:

  • JavaScript is already almost completely deterministic. There are only a handful of non-deterministic operations like Math.random, and JavaScript’s flexibility allows us to easily remove these or replace them with deterministic proxies. Contrast this with C#, where it’s impossible to prevent some block of code from performing non-deterministic IO or having non-deterministic behavior by using threads.
  • JavaScript allows us to create perfect proxies and membranes. You can create proxies for classes and not just instances of the class, and you can create proxies for modules. In C#, there is no way to put a membrane around an assembly, namespace, or class.
  • Related to creating perfect membranes, JavaScript allows you to hook into all IO, including the importing a dependent module. This would allow the framework to trace the depenency tree of each individual service and create the appropriate bundles for each.

I think it would be possible to get some approximations of the idea in other programming languages, but it certainly appears to me that JavaScript is particularly well suited.

More Details

Most of the above has been quite abstract. I think it would be worth explaining the vision in a little more concrete detail.

I imagine there to be a thing, which I might call a “service”, which is the minimal distribution unit (we’ll say a service is “a thing that can be deployed”). A service might be a single microservice/lambda, or a database, or a whole distributed application made up of nested services.

A typically SaaS company using this solution would probably just have one root “service” that represents their whole distributed application, and that service would be composed of subservices1, which could each have their own subservices, etc.

A service is a JavaScript module (file), which runs at build time on the build machine (or dev machine), and then with snapshotting is moved to the runtime machine(s) when it’s finished its initialization (when all the top-level code has run, including its transitive function calls, etc).

While executing at build time, the service code is then able to configure its own runtime “hardware” that it will eventually be moved to. For example, it might call a host function to declare “I’d like to run as a lambda with automatic scaling”, etc. The host can record this information and provision the necessary resources before moving the service to its own desired runtime environment.

A special case of this general rule is that a service can choose a runtime manifestation that doesn’t support code execution at all. For example, a database, queue, or pub-sub system.

A service can instantiate other services at build time. Concretely, this might look something like:

var myService = importService('./my-service-code.js');

I expect this to be roughly analogous to just importing another module into the current service, similar to using node’s require() function. It will import and execute the given JS module (in the same machine process), and return an object representing the script’s exports. But with the distinction that new module is to be loaded inside a membrane, and the return value is therefore a proxy for the actual exports inside the membrane.

The membrane serves a few different purposes:

  • When the running services are “moved” to their runtime environments, services in different membranes will be on different physical runtime hardware. Service code may at build time configure its future runtime hardware.
  • At build time, the membrane records all IO exchanges such that it can deterministically replay the IO at runtime during initialization, to restore the service to its exact “snapshotted” state, as mentioned earlier. It can verify and silently absorb outgoing IO, and deterministically replay incoming IO.
  • At build time, services are allowed to pass around references to other services, as a kind of “dependency injection” phase. At runtime, the membranes around each service need to handle the marshalling of calls between services as they use these injected dependencies.

The importService function can construct a new Compartment which allows it virtualize the environment in which the service code runs. This can have a variety of effects:

  • We can provide a replayable variation of non-deterministic builtins, such as Math.random and new Date. These can be treated as a special case of IO. They can return real dates and random numbers at build time, as long as they return the same dates and random numbers when replayed at runtime.
  • We can provide service-specific APIs. For example, a function akin to pleaseLetMeRunAsALambda() could be injected into the globals for the service, or we could expose additional builtin-modules to the service code.
  • We can intercept module imports so that we can bundle the service with its module dependencies into a single package for deployment, without the overhead of bringing in dependencies used by other services.

Pulumi

I haven’t done much research on this yet, but Pulumi seems to provide an API for defining infrastructure in JavaScript code. I speculate that the Pulumi API could be brought in as a low-level build-time API for services to define things like “I want to be a lambda when I grow up”.

Conclusion

I’m fairly confident that something like this would work, and I think it would make for much cleaner code for distributed applications. I also don’t think it would take that long to implement, given that many of the pieces already available off-the-shelf. But even so, I probably shouldn’t get too distracted from Microvium.


  1. This kind of infinitely recursive hierarchical organization of the architecture is something I’ve seen missing from AWS and Azure