Month: January 2019

Hughes List

Hughes List

Inspired by Eric Lippert’s recent post on a data structure called a Hughes list, I thought I’d play around with it by writing my own equivalent in JavaScript.

Side note before I get started. For my usual readers who are following the progress of my JavaScript-to-native compiler (MetalScript), I’ll hopefully continue blogging about it soon. I’ve just been in the middle of an international move, along with some other projects, and haven’t had much time to work on MetalScript. My flight out of the country leaves in a few days, and then hopefully my schedule will open up and I’ll continue on it. For those who don’t know what I’m talking about but want to know more, also check out my last post where I link to a presentation on what MetalScript is, or any other posts in the category.

Getting back to the point of this post…

The key benefit I see about Hughes list structure is that it is a persistent (immutable) list that allows you to concatenate, prepend, and append to the list in O(1) time. It does so by lazily accumulating the operations to be performed until the list contents are actually needed, when it then finally performs a single O(n) pass to build the final “real” list (at least this is my interpretation of how it works).

For those who want to cut right to the chase, see this GitHub repo for the final code. It’s not a lot of code. test.mjs is the entry point, and hughes-list.mjs is the implementation of the data structure.

I suggest reading Eric’s series first for a full explanation of the underlying principles before comparing it to mine. I’m not here to explain it (which Eric has done perfectly) but rather to provide a twist and an implementation in a different programming language.

I deviated quite a lot from Eric’s implementation, in the following ways:

  • My Hughes list is built up by composing imperative procedures that each mutate a JavaScript array, rather than working with an underlying immutable SimpleList like Eric uses. This produces the same logically-immutable list structure and achieves similar performance characteristics but saves on implementing an additional SimpleList type.
  • I omitted the wrapper class. If you inspect the list in a debugger, the list will actually be the function, rather than containing the function. I wouldn’t do this in production code but it leads to a nice simple implementation here.
  • As a superficial detail, I used the traditional JavaScript function names push and unshift instead of append and push respectively.

Before jumping into the detail of the implementation, it’s probably best to take a look at how this list structure is used. These are the unit tests I have for it (see the full file for more context).

checkList(HL.empty                               , []);
checkList(HL.push(list1, 4)                      , [1, 2, 3,   4]);
checkList(HL.concat(list1, list2)                , [1, 2, 3,   4, 5, 6]);
checkList(HL.unshift(list1, 0)                   , [0,   1, 2, 3]);
checkList(HL.concat(HL.push(list1, 10), list2)   , [1, 2, 3,   10,   4, 5, 6]);
checkList(HL.concat(HL.unshift(list1, 10), list2), [10,   1, 2, 3,   4, 5, 6]);
checkList(HL.concat(list1, HL.push(list2, 10))   , [1, 2, 3,   4, 5, 6,   10]);
checkList(HL.concat(list1, HL.unshift(list2, 10)), [1, 2, 3,   10,   4, 5, 6]);

These should all look pretty obvious. If you’re not familiar with persistent data structures, the most important thing to point out here is that a call like HL.unshift(list1, 0) does not change list1, but rather returns a new list that is the like list1 but with 0 at the front. This is also good for me to highlight because I said that my implementation of Hughes list composes imperative procedures that mutate the underlying JavaScript array – so it’s worth getting your head around how this immutable list structure is implemented on top of array mutations.

My implementation of the Hughes list data structure comes out to something like the following:

export const empty = () => {};
export const single = x => xs => xs.push(x);
export const concat = (ls1, ls2) => xs => { ls1(xs); ls2(xs) };
export const push = (ls, x) => xs => { ls(xs); xs.push(x) };
export const unshift = (ls, x) => xs => { xs.push(x); ls(xs) };

The things to highlight above are:

  1. The only operation used on JavaScript arrays here is push, even in the implementation of unshift. This is the key benefit of the Hughes list. Depending on the implementation of arrays in the JavaScript engine, the native unshift on arrays may be O(n) (on an array of length n), while push is typically O(1).
  2. As previously noted, this implementation performs mutations, so rather than ls2(ls1(xs)), we do ls1(xs); ls2(xs). The former might have created a new array/list from ls1 that is passed to ls2, while the latter simply invokes both mutations one after the other on the same mutable array.
  3. You’ll notice that the implementation is backwards from Eric’s implementation, in that the definition of a list is a function that appends its items to the end of a given array, not the beginning. This again is just because appending an item to the end of an array in JavaScript is the most efficient.

I personally think this implementation still captures the essence of what makes the Hughes list structure useful, while also interplaying well with the king of JavaScript list-like structures, the array. The key effect is that it allows us to concatenate, prepend, or append items in any order with almost no cost, and let the list methods build the compound procedure such that the operations are performed in the optimal order for performance.