Snapshotting vs Bundling

Snapshotting vs Bundling

TL;DR
Bunding and snapshotting are two different ways of packing a program for deployment. This post is a somewhat-biased overview of why snapshotting is clearly superior in many respects.


I recently wrote up an explanation of Microvium snapshotting, along with this hopefully-helpful animated diagram:

In short: the diagram depicts a running Microvium (JavaScript) application that is suspended and transferred to a new environment where it continues where it left off. For the typical use of Microvium, the original environment might be a development machine or back-end server, and the new environment might be a low-resource microcontroller.

I want to expand on snapshotting a bit more in this post and, in particular, I want to contrast it to an alternative: bundling.

Background

JavaScript doesn’t have a preprocessor as C does. I’ve argued before that this is a great thing: having all behavior represented the same way, in the same language, rather than needing a separate preprocessor language, template meta language, linker script language and makefile language, etc. It also means there is no distinction between compile-time and runtime: everything is runtime. It eliminates needless concepts. I like this ideal and I want to stick to it.

But if you don’t have #include, how do you access behavior declared in other source code files? (other modules)

In JavaScript, unfortunately, there are lots of different ways of importing other modules, and they don’t have consistent semantics. Today, let’s talk about probably the most common one, at least in node.js applications, which is the use of a function called require(). Importing a module like this means that you are using the commonjs system.

Bundling

A bundler is a way of pre-combining all the JavaScript files in your project (and other files, to some extent) into one (or a few) files for distribution. This is particularly needed in web applications, which could contain hundreds of thousands of files, but you don’t want the user’s web browser to have to download each of these files separately.

Bundling is not a passive operation like creating a zip file. A typical bunder will actually analyze your code to figure out what files in your file system it imports, and repeat this to transitively to pull in all the required dependencies.

Bundling sounds an awful lot like preprocessing or compilation — it’s a step that happens before runtime, that needs to statically analyze your code in order to produce a distributable output. Something smells funny here.

JavaScript is not a langauge designed for such static analysis and I argue that it never should be.

Ok, let’s look at a concrete example of a bundler: Webpack.

In the following code, module a, which we’ll assume is the entry point, imports module b, and prints a value from it.

// a.js
const b = require('./b');
console.log(b.bValue);
// b.js
exports.bValue = 5;

This does what you expect it to do. Now let’s pack it with webpack1.

webpack --entry ./a

On my machine, this outputs a single js file, which when run, correctly prints the value 5 to the console.

Ok, but, how did it know to bundle module b into the result, so that it actually runs correctly?

The answer appears to be: it makes an educated guess by looking at the source code and seeing that it calls require with an argument of ./b.

This sounds so dodgy. But I guess that people had JavaScript that needed bundling, and bundlers seemed like the only way to fix it.

It’s easy to mess this process up. Let’s change a to the following code which does the same thing as before if evaluated directly in node.js, but which causes completely different Webpack behavior:

// a.js
const b = require('./' + 'b'[0]);
console.log(b.bValue);

What does webpack do with this?

Err, well, the resulting bundle still correctly outputs 5 to the console when executed, but now the bundle is 105 kB instead of 5 kB, at least on my machine. Why? Webpack has clearly given up trying to figure out statically what I’m importing and has instead included everything in the folder as part of the bundle, which in my case happens to include both the output files and webpack itself (so Webpack is packing itself into the bundle!).

But, it gets worse. Let’s change module a to the following code, which again does the same thing when run in node.js, but causes completely different webpack behavior:

// a.js
const b = arguments[1]('./b');
console.log(b.bValue);

This source runs in node because require is actually a parameter passed to the commonjs module code and it so happens to be the second parameter.

Webpack doesn’t see require at all in this source, so it assumes it doesn’t need to include anything else in the bundle. Consequently, the bundle throws an exception when run in node.js.

Sure, this example here is abusive towards Webpack — I’m picking pathological cases to expose Webpack’s Achilles’ heel.

A better solution: snapshotting

It’s my blog; I’m allowed to make bold, opinionated claims about what things are “better” than others. Snapshotting is better than bundling.

A Microvium snapshot fulfills the same purpose as a Webpack bundle, in the sense that the snapshot is a highly-efficient representation of all the resources that the JavaScript application needs in order to run in a future environment. But it’s better than a bundle because:

  1. It does not require a third-party tool to make it
  2. It does not depend on shady guesswork about what’s being imported. Rather, the application code actually runs and performs its imports using the full semantics of the language.
  3. It also does not require any declarative representation of what should or should not be included in the bundle, ala tsconfig.json or webpack.config.json. I really dislike these external declarative files, because they add needless complexity to the application.

Let’s get concrete, translating the same example as earlier but to leverage Microvium snapshotting:

// a.js
import * as b from './b';
import ffi from 'ffi';

ffi.export('restore', whenRestored);
function whenRestored() {
  console.log(b.bValue);
}
// b.js
export const bValue = 5;

When run on a suitable host, the above will print 5 to the console, the same as the Webpack equivalent does. It does this by performing the imports at runtime before snapshot is taken, and then the snapshot can be deployed to an environment on which importing might not even be supported at all.

Let me see if I can draw another diagram to illustrate this:

Application imports dependencies (e.g. other modules) and then moves to a new environment

Please note that the ES6 module import syntax is not the reason why this works. It will work equally well with dynamic import() when that’s been implemented. In fact, there are working test cases that interact with resources directly using node’s fs module.

The pattern isn’t new

Web developers are already familiar with the idea of two distinct phases of runtime execution: there is before the page has loaded, and there is after the page has loaded. It is the same application executing across both phases, but the application has access to different resources after the page has loaded (for example, all the DOM elements and images).

To run part of your application after the page has loaded, you simply subscribe a callback to the load event:

window.addEventListener('load', whenLoaded);
function whenLoaded() {
  console.log('The page is fully loaded');
}

The similarity between this and Microvium example should be clear. Snapshotting is just the mechanism that is used to transfer the running application from the development/build environment to where it will run on the user’s device.


  1. Assuming you’ve installed webpack globally 

6 Replies to “Snapshotting vs Bundling”

  1. This seems to be linked to turing completeness of code generator:

    https://stackoverflow.com/questions/3136686/is-the-c99-preprocessor-turing-complete

    Also related to templates (it generates/modifies code).

    With ES imports, dependencies are specified statically (in my knowledge).

    With snapshot technique, they can be dynamic.

    const wearable = currentTemp < 15? require('sweater') : require('casual');

    This would also be possible to do with a separate code generator script that only picks up one of the two, based on current temp where build is executed.

Leave a Reply to Michael Hunter Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.