This post is different from my usual topic of Microvium. I’ve been recently frustrated with the way that distributed applications are written and I’ve been brainstorming ways it could be improved. In my last post on the topic, I suggested that maybe a new programming language would be useful for solving the problem, but I ended the post thinking about how the Microvium snapshotting paradigm might also be a suitable solution. This time, I’m going to consider a solution that doesn’t require Microvium.
I want the experience of developing a distributed application to be basically as seamless as that of developing a single-process desktop application, such as the ability to get type checking between services, to debug-step directly from one service to another during development, to write regression tests that encompass multiple services, etc. And I want this all with minimal boilerplate.
My last post on the topic goes into more detail about the issues I have with the current state of affairs and the improvements that could be made.
Here’s the summary: I think the Microvium snapshotting paradigm is indeed the answer, and I think we can in a sense “polyfill” the snapshotting ability in a limited context by deterministically replaying IO.
I’ll talk about this solution a few parts:
- What is snapshotting and how could we use it to solve the problem?
- More details
What is snapshotting and how does it help?
I’m using the term “snapshotting” here to mean the ability to “hibernate” a process to a file and then resume it later on the same or a different machine. This means capturing to a file the full state of the process, such as the global variables, the stack and heap state, and machine registers. And the ability to restore the full state in a completely new context.
Normally applications are deployed as either a compiled executable or as source code depending on whether the programming language is compiled or interpreted. Having the snapshotting capability gives us a third option: run the application code at build-time and deploy a snapshot of its final initialized state.
Doing it the latter way allows application code to have pre-runtime effects, such as defining infrastructure, and for pre-runtime code to pass information seamlessly to runtime code, such as secrets and connection strings for the aforementioned infrastructure.
How to implement snapshotting?
Node on the other hand is used for cloud software all the time, and I’m a huge fan of it, but it doesn’t support snapshotting in the way described.
But I think we can emulate the snapshotting feature on Node by having a membrane around the application that records all IO while it runs at build-time, and silently replays the IO history as an initialization step at runtime to recreate the final, initialized state. Although this will be slower than just recovering a direct dump of the memory state (if the engine had supported that), in theory, it should work.
Most of the above has been quite abstract. I think it would be worth explaining the vision in a little more concrete detail.
I imagine there to be a thing, which I might call a “service”, which is the minimal distribution unit (we’ll say a service is “a thing that can be deployed”). A service might be a single microservice/lambda, or a database, or a whole distributed application made up of nested services.
A typically SaaS company using this solution would probably just have one root “service” that represents their whole distributed application, and that service would be composed of subservices1, which could each have their own subservices, etc.
While executing at build time, the service code is then able to configure its own runtime “hardware” that it will eventually be moved to. For example, it might call a host function to declare “I’d like to run as a lambda with automatic scaling”, etc. The host can record this information and provision the necessary resources before moving the service to its own desired runtime environment.
A special case of this general rule is that a service can choose a runtime manifestation that doesn’t support code execution at all. For example, a database, queue, or pub-sub system.
A service can instantiate other services at build time. Concretely, this might look something like:
var myService = importService('./my-service-code.js');
I expect this to be roughly analogous to just importing another module into the current service, similar to using node’s
require() function. It will import and execute the given JS module (in the same machine process), and return an object representing the script’s exports. But with the distinction that new module is to be loaded inside a membrane, and the return value is therefore a proxy for the actual exports inside the membrane.
The membrane serves a few different purposes:
- When the running services are “moved” to their runtime environments, services in different membranes will be on different physical runtime hardware. Service code may at build time configure its future runtime hardware.
- At build time, the membrane records all IO exchanges such that it can deterministically replay the IO at runtime during initialization, to restore the service to its exact “snapshotted” state, as mentioned earlier. It can verify and silently absorb outgoing IO, and deterministically replay incoming IO.
- At build time, services are allowed to pass around references to other services, as a kind of “dependency injection” phase. At runtime, the membranes around each service need to handle the marshalling of calls between services as they use these injected dependencies.
importService function can construct a new Compartment which allows it virtualize the environment in which the service code runs. This can have a variety of effects:
- We can provide a replayable variation of non-deterministic builtins, such as
new Date. These can be treated as a special case of IO. They can return real dates and random numbers at build time, as long as they return the same dates and random numbers when replayed at runtime.
- We can provide service-specific APIs. For example, a function akin to
pleaseLetMeRunAsALambda()could be injected into the globals for the service, or we could expose additional builtin-modules to the service code.
- We can intercept module imports so that we can bundle the service with its module dependencies into a single package for deployment, without the overhead of bringing in dependencies used by other services.
I’m fairly confident that something like this would work, and I think it would make for much cleaner code for distributed applications. I also don’t think it would take that long to implement, given that many of the pieces already available off-the-shelf. But even so, I probably shouldn’t get too distracted from Microvium.
This kind of infinitely recursive hierarchical organization of the architecture is something I’ve seen missing from AWS and Azure ↩