Thoughts about Random Access
Recently I’ve been thinking about what it means for a data structure to support random access (aka “direct access”). This may seem so obvious that it’s not worth thinking about: an array supports “random access” because you can access any element with O(1) complexity by just calculating the elements position and going straight there. A linked list doesn’t have this property because the only way to get to an element is through other elements that it’s linked to. But let’s explorer this idea further.
Here’s one way Wikipedia explains random access quite well:
As a rule the assumption is that each element can be accessed roughly as easily and efficiently as any other, no matter how many elements may be in the set, nor how many coordinates may be available for addressing the data
Let me add another note to this which is perhaps obvious but not normally specified: we’re talking about runtime complexity. If we had a linked list completely defined at compile-time then I would still classify it as a “random access”, even though linked lists are normally not considered to be random access1.
So what is it about an array gives us random access? Let’s take a guess: it’s because the elements of the array are stored contiguously in memory.
This seems like a reasonable first guess. Because “element 2” is known to be after “element 1” which is after “element 0”, we can calculate where “element 2” is. To get the position of element 2, we multiply the size of an element by 2, and move to that binary position in the array.2
But what if the elements aren’t all the same size?3
If element 1 in the array might be larger than another then we can’t multiply anymore so we need to somehow sequentially access the elements, right? Well, no. If the size of element 1 is known at compile time then we can simply offset this amount when calculating the positions of any element that comes after it.
Perhaps I deceived you when I said all the elements aren’t the same size in this example: I didn’t say whether or not there was static information about the sizes of elements. Lets say that we have an array, which we’ll define such that every second element is 4 bytes big, and every other element is 8 bytes big. To calculate the binary position of an even element in the array we just use `index / 2 * 12 + (start of array)` and to calculate the position of an odd element in the array we just say `index / 2 * 12 + (start of array + 4)` (assuming some implicit integer truncation). These are both constant time operations.
So to sum up, it’s not about whether the array elements are the same size, or how they are laid out in memory. It’s about what we know at compile time vs what calculations needs to be delayed until runtime, and how complicated those runtime calculations need to be. If we go back to our original definitions then this isn’t anything new – it’s just a restatement of our definition of random access. But it does make me think a bit differently about it.
This would be quite difficult to do in most languages. You could do it in C++ with template meta programming ↩
Actually it’s not the size of the element that counts, but the pitch of the array – there might be padding bytes between elements to help properly align them. But let’s just imagine the “size” includes the padding between the elements ↩
I don’t know any compiler that will generate arrays of elements that don’t all have the same inline value size, but we’re not so much talking about what compilers do as about what could be done ↩