Sequences: Part 5
Last time, I talked about push and pull regarding sequences. We saw that it’s more convenient to write code that pulls from its inputs and pushes to its outputs. We took a look at C#’s generators, and how they enabled us to write sequence-processing functions in this way, without the need for intermediate buffers.
Let’s quickly recap generators. A generator in C# looks like a traditional function (with a signature that returns a sequence), but it can push values to the caller using the special syntax yield return
, which essentially puts the generator function “on hold” until the consumer/caller asks for the next value1:
static IEnumerable<int> Values() { for (int i = 42; i < 52; i++) yield return i; }
The two parties involved here are the generator and the caller (which I’ll call the consumer since the generator is a producer).
When the consumer asks for the next value in the sequence, the generator function is temporarily “resumed”, long enough to produce the next value of the sequence. Last time we drew an analogy with freezing time to explain why it’s easier to write the generator code now that it thinks it’s pushing values to the consumer.
But it’s important here to note who is being paused an who is being resumed. When the compiler is producing IL for the consumer function and the generator function, it is the generator that gets reorganized into a form where it can be paused and resumed (it gets converted into a class which implements the IEnumerable<T> "pull" interface).
But what would happen if the next item in the sequence just wasn’t available yet. If we go back to last week’s C example of reading input from the user by pulling values from getchar
(or Console.Read
in C#), you can see that generators wouldn’t fix the conflict between push and pull in that case.
Let’s simplify things a bit to investigate further. Instead of considering a whole sequence of items, let’s say that there’s just one item. We can pull the item from somewhere by calling a function that returns that item:
int PullFromProducer() { // ... return object } void Consumer() { int obj = PullFromProducer(); // blocks thread until producer returns // ... use object }
When the consumer calls PullFromProducer
to fetch the item (an integer), the caller is blocked until the PullFromProducer
function returns (synchronously).
The generator syntax in C# still uses this pattern under the covers – the generator function still returns IEnumerable<T>, which as we know from our previous exploration is a pull-based iterator interface.
But what if PullFromProducer
simply doesn’t yet have the value that it needs to return? For example, how do we implement the Pull
function if it’s to pull from a network connection, which may not have received the value yet?
Like the C# generator makes it possible to pause the producer, wouldn’t it be nice if there was a way to pause the consumer? Obviously we can do this with threads, but wouldn’t it be nice if there was a way to do this without the overhead of threads?
It turns out that in C# there is. C# 5 introduced the concept of async functions. You’ve seen async functions before on this blog, so I won’t go into too much detail. If you aren’t too familiar, I highly recommend reading up about them (here is the MSDN introduction, and I also highly recommend Jon Skeet’s Eduasync series for really getting to know what’s going on behind the scenes2 ).
Using async we can make code that looks like this:
Task<int> PullFromProducer() { // ... return object } async void Consumer() { int obj = await PullFromProducer(); // "Pauses" Consumer until producer's done // ... use object }
The magic happens in the consumer this time. The consumer function is suspended at the “await” point until the producer pushes the value to the consumer.
To emphasize what’s happening here, let’s look at a slightly different example :
class Program { static TaskCompletionSource<T> mockDataSource = new TaskCompletionSource<T>(); static void Main() { // Run the consumer Console.WriteLine("Calling the consumer"); Consumer(); Console.WriteLine("..."); // Push the value Console.WriteLine("Value is now available"); Console.WriteLine("Pushing value to consumer"); mockDataSource.SetResult(42); Console.WriteLine("Value was pushed to consumer"); } static async void Consumer() { Console.WriteLine("Consumer is about to await value"); int value = await mockDataSource.Task; Console.WriteLine("Consumed: " + value); } }
If you run this3 you’ll see the output is something like this:
Calling the consumer Consumer is about to await value ... Value is now available Pushing value to consumer Consumed: 42 Value was pushed to consumer
The interesting thing is the order of the messages. The message line “`Consumed: 42`” occurs directly after “`Pushing value to consumer`” rather than after “`Consumer is about to await value`”, which clearly shows that the consumer is suspended during the intermediate time. But just like with generators, it’s important to realize that the above example does not create any additional threads. Just like with generators, the `async` functionality is implemented by the compiler by creating a new class behind the scenes.
This solves our problem, right?
Nope.
The problem is that async only works with a single value. We can use it to push a once-off item, but not whole sequences of items.
C# is stuck with two different ways of doing things with sequences. There’s the pull-based approach with `IEnumerable<T>`. And there’s the push-based approach with `IObservable<T>` ((I won’t go into `IObservable`, but if you’re interested take a look at reactive extensions – they echo many of the great features of `IEnumerable`, such as all their extension methods, but do it for a push-based interface instead of a pull-based one).
What we need is something more like an `IAsyncEnumerable<T>` interface, which combines task-based asynchrony with a sequential pull-based interface. We also need language support for `IAsyncEnumerable<T>`, including generators and `foreach` statements. The combination of generators and `IAsyncEnumerable` would allow us to have everything we’ve been looking for so far:
- No containers required (sequences don’t have to be in memory before you can work on them)
- Zero buffering overhead (when we can process sequences as fast as they’re produced)
- Completely abstract sequence types (a sequence of user key press events can be as much a sequence as an array of integers)
- Push/pull agnostic (`IAsyncEnumerable` covers both push and pull cases equally)
- No thread blocking
- All functions can be written in a form where they both pull input and push output
I apologize to those who aren’t comfortable in C#, since I did originally say that this was going to be a language-agnostic investigation but we landed up in C# anyway. Unfortunately this is because it seems that C# is the only popular language that’s made it this far in providing a solution that fits all these criteria. It just needs to take the last step (although I’ve mentioned in the past that I think that async is flawed in a way that only a new language can cure). C and C++ are simply not well suited to this kind of coding at all.
This brings us to the end of our series on sequences. We started with the most simple C example, which required a buffer on both the input and output side, could not be suspended at all, and provided no abstraction on what the form the input and output could be. We considered ways to improve it, and in doing so investigated how sequence-processing functions can be composed/layered, and the differences between push and pull. At each stage of improvement we ruled out newer and newer old-languages, until we landed up with only a theoretical research-based extension to the latest C#, which seems not to have made it into the mainstream despite it being investigated more than 4 years ago.
This is very limited description. For more detail take a look at my previous post and read up about it online ↩
Although his series is a bit old now, much of what he said still applies, and it is the most insightful writing I’ve seen on the topic ↩
I admit, I don’t have a C# compiler installed right now, so I can’t actually confirm this. If you see a mistake please let me know. ↩