Sequences: Part 4 – Push and Pull
Welcome back! This week in our exploration of sequences we’re going to do something a little different. Up until now we’ve been looking at a single running example of a function that interacts with sequences, and progressively developing its interface and considering the consequences of each choice. But this week I’d like to delve a little deeper into the meaning of these strange concepts of push and pull, as they apply to sequences.
What is push and pull?
Wikipedia doesn’t talk directly about “push” and “pull” as coding patterns, but does talk about the related concept of a push technology, which it says is a
…style of Internet-based communication where the request for a given transaction is initiated by the publisher or central server.
There is also a corresponding Wikipedia entry for pull technology, where requests are initiated by the client.
So here it seems that these terms are, in a sense, used to say which side of communication interface has control, or initiates an action. If the server has control, and is the one to initiate or “cause” the transaction, then the server is pushing to the client.
We can apply the same principles to software design at a smaller scale. When your code calls a function, it could be said to be a client of that function. When your code uses an object, it could be said to be a client of that object. So our server-client relationship applies in these cases as well.
Pull
Consider the following simple piece of C code which might be used to print a sequence of values to the console:
initProducer(); for (int i = 0; i < numValues; i++) { int value = produceValue(); printf("Value %i: %i\n", i, value); }
This code uses the values from the sequence, so it’s a client of the sequence. It accesses items from the sequence by calling produceValue
, which pulls the next item from the sequence. The pulled value is available to be used on the very next line of code, where in turn it’s pushed to the console output. Whatever is producing the sequences responds to the pull by providing the pulled values as they’re requested. This is the same iterator pattern that we’ve been looking at in past posts1.
Push
What happens if the producer of the sequence needs to provide these values by pushing them to the consumer? Now our consumer code might look like this:
int i; void initConsumer() { i = 0; } void consumeValue(int value) { printf("Value %i: %i\n", i, value); i++; }
All the same elements of the code are here: the declaration of int i
, the initialization of i
to 0
, printing the values to console by printf
, and incrementing i
after every value is consumed. The producer just has to call initConsumer
before it starts producing values, and it has to call consumeValue
every time it produces a new value. So here the producer is in control, and decides when new values are available.
Strangely enough, by writing the consumer in such a way that it doesn’t pull values, the producer code might look much like our original consumer code:
initConsumer(); for (int i = 0; i < numValues; i++) { consumeValue(i * 10); // produces values 0, 10, 20, 30, ... }
It’s clearly much easier to write producer code that pushes, and consumer code that pulls. But these two things seem to be at odds with each other – either the producer or the consumer is “in charge”, and the other one loses out. Many sequence-manipulating functions both produce and consume sequences, since they have an input and an output. It’s much easier to write these kinds of functions if they can pull from a data source, and push to a data sink.
C++ seems to deal with this by dividing “sequence processors” broadly into 2 categories: on the one hand we have containers (and their corresponding iterators) which are happy to be both pushed-to and pulled-from. On the other hand we have algorithms and application-specific functions, which both push and pull from containers in order to get their job done. These application-specific functions can be written more simply because now they don’t have to worry about being pushed-to or pulled-from.
It’s as if the containers act as peace-makers, mediating between all these functions that all want to be in charge, that push and pull the data whenever they find it most convenient. But as I’ve mentioned before, containers almost always come at a cost, and it would be better if they weren’t the default solution to the problem of manipulating sequences.
Harmony between Push and Pull
But consider what happens if we take our first example, and replace getValue
with the getchar from the stdio
library:
for (int i = 0; i < numValues; i++) { int value = getchar(); printf("%i\n", value); }
This code instead reads the values from the console2. This is still a pull, since the code appears to be actively causing the value to be acquired, and just like before, the value is available to be used on the very next line of code.
But yet we also know where these values ultimately come from: they come when the user pushes keys on the keyboard (this is a “push” in two senses of the word). How can it be that both sides of the interface think that they initiated the transaction?
In many ways I think it’s equally true that both sides did really initiate the transfer. This example shows that it’s possible for the parties on both sides of the interface to feel like they’re both in complete control, calling the shots and deciding when the next value should appear. In this particular case what’s happening is that the thread is suspended or blocked until the value becomes available.
Let’s step out of reality for a moment, and imagine a strange world with different laws of physics and time. Imagine that I decide I want to talk to my friend, and after dialing his number on my phone, the phone freezes me in time, without forwarding the call to my friend. There I am, suspended in time in my room for days, not knowing that the world is continuing around me. A few weeks later, of his own accord, my friend also spontaneously decides it would be a good time to give me a call. After dialing my number, time unfreezes for me, and I hear his voice as if no time had passed at all for me. To me, I feel like I called him, but to him, he feels like he called me. In fact both are true: we both initiated the call.
This is very convenient for both my friend and myself, since we both waited till the perfect moment to call each other. Neither of us were interrupted by “receiving” the other person’s call (possibly while we were in an inconvenient state, such as in the shower or on another call).
The same is true in the world of code. Code which pushes and pulls is much easier to write than code which is pulled-from or pushed-to. State management is much easier, because code that is pushed-to has to be prepared to be interrupted (hear: race-conditions and re-entrancy problems).
Just like freezing time is difficult in the real world, multithreading in the software is not easy either (although it’s obviously much easier to suspend thread-time than to suspend physical time!). But lets take a look at another example, this time in C#:
static void Main() { foreach (int value in Values()) { Console.WriteLine(value); } } static IEnumerable<int> Values() { for (int i = 42; i < 52; i++) yield return i; }
The example is simple. In our main function we have a loop that pulls values from a sequence of values returned from the Values
function. The Values
function which produces the data also has a loop, and pushes values the consumer (Main
). Don’t let the keyword “return” deceive you into thinking that that the Values
function is not pushing, since the act of returning is initiated by the called function, and the function progresses to the next line of code when the consumer comes back for the next value. The code would look pretty much the same if yield return i
was instead produceValue()
. Or to put it another way, the code would look horribly different if Values
was actually just responding to pulls and not structured in such a way that it’s pushing.
Neither the Main
function nor the Values
function is more in control than the other – they are like a pair of coroutines, cooperatively yielding control to each other. Both are written as if they’re in control, which makes the code much simpler to write. And best of all, it does this without using any extra threads! This is the power of the generator – to be able to write functions which produce values by pushing them, while their consumers are able to pull them. The best of both worlds.