Today I was reading about an implementation of a MultiDictionary in C# – a dictionary that can have more than one value with the same key, perhaps something like the C++ `std::multimap`. It all looks pretty straight forward: a key can have multiple values, so when you index into the `MultiDictionary` you get an `ICollection<T>` which represents all the values corresponding to the key you specified. This got me wondering, why did the author choose to use an `ICollection<T>` instead of an `IEnumerable<T>`? This wasn’t a question specific to `MultiDictionary`, but more of a philosophical musing about the interfaces between different parts of a program.
What is IEnumerable/ICollection?
In case you’re not familiar with these types, the essence is that `IEnumerable<T>` represents a read-only set of items (of type `T`), which can be sequentially iterated using `foreach`. Through the use of LINQ extension methods we can interact with `IEnumerable<T>` at high level of abstraction. For example we can map an `IEnumerable<T>` `x`using the syntax `x.Select(…)`, and there are methods for many other higher-order functions. Another great thing about `IEnumerable<T>` is that its type parameter `T` is covariant. This means that if a function is asking for an `IEnumerable<Animal>`, you can pass it an `IEnumerable<Cat>` – if it needs a box of animals, a box of cats will do just fine.
An `ICollection<T>` is an `IEnumerable<T>` with some extra functionality for mutation – adding and removing items from the collection. This added mutability comes at the cost of covariance: the type parameter of `ICollection<T>` is invariant. If a function needs an `ICollection<Animal>`, I can’t give it an `ICollection<Cat>` – if it needs a storage for animals, I can’t give it a cat storage because it might need to store an elephant.
What to choose?
`IEnumerable<T>` is a more general type than `ICollection<T>` – more objects can be described by `IEnumerable<T>` than by `ICollection<T>`. The philosophical question is: should you use a more general type or a more specific type when defining the interface to a piece of code?
So, if you’re writing a function that produces a collection of elements, is it better to use `IEnumerable` or `ICollection`1 (assuming that both interfaces validly describe the objects you intend to describe)? If you produce an `IEnumerable` then you have fewer requirements to fulfill, but are free to add more functionality that isn’t covered by `IEnumerable`. That is, even if the code implements all the requirements of `ICollection`, you may still choose to deliberately only expose `IEnumerable` as a public interface to your code. You are committing to less, but free to do more than you commit to. This seems like the better choice: you’re not tied down because you’ve made fewer commitments. In the future if you need to change something, you have more freedom to make more changes without using a different interface.
If you are consuming a collection of elements, then the opposite is true. If you commit to only needing an `IEnumerable` then you can’t decide later you now need more – like deciding you need to mutate it. You can demand more and use less of what you demanded, but you can’t demand less and use more than you asked for. From this perspective, even if your code only relies on the implementation of `IEnumerable` it seems better to expose the need for `ICollection` in its interface because it doesn’t tie it down as much.
This strikes me to be very similar to producers and consumers in the real world. The cost of producing a product benefits from committing to as little as possible, while the “cost” of consuming a product benefits from demanding as much as possible. But there is also another factor: a consumer who demands too much may not be able to find the product their they’re looking for. That is, more general products can be used by more people. In software terms: a function or class with very specific requirements can only be used in places where those very specific requirements are met – it’s less reusable.
Is it Subjective?
Perhaps this all comes down to a matter of taste or circumstance. Code that’s very “needy”2 is less reusable, but easier to refactor without breaking the interface boundaries. Code that’s very “generous”3 is more reusable but more restricted and brittle in the face of change.
I personally prefer to write code that is very generous. That is, given the same block of code, I write my interfaces to the code such that I consume the most general possible types but produce the most specific possible types. In many ways this seals the code – it’s very difficult to change it without breaking the interface. But on the other hand, it makes different parts more compatible and reusable. If I need to make a change, I either change the interface (and face the consequences), or I simply write a new function with a different interface.
I also think that it may be better to write very generous interfaces to code that is unlikely to change and will often be reused, such as in libraries and frameworks. Domain-specific code which is likely to change across different versions of the program may benefit from being more needy.
Although this whole discussion is not specifically about the `MultiDictionary`, the idea was seeded by my encountering of `MultiDictionary`. So lets spend a moment considering how I might have implemented `MultiDictionary` myself.
Firstly, most of this discussion so far has been centered around how you would define the interface, or function signature, of a piece of code that either produces or consumes generic types. I haven’t really gone into what that code should actually look like, but rather what its interface should look like given that the code already has intrinsic demands and guarantees. If we apply this to the `MultiDictionary`, then it probably has its most generous signature already. The code already implements `ICollection`, so it’s generous to expose it in its interface. The dictionary itself doesn’t really consume any interfaces beyond it’s basic insert and delete functions, so we can’t talk about its generosity as a consumer4.
On the other hand, if I had written the code myself I possibly would have only exposed `IEnumerable`. Not because I write needy interfaces5, but because it just seems like a better representation of the underlying principle of a multi-dictionary. As the code stands, there are two different ways of adding a value to the dictionary: one by calling the `add` method on the dictionary, and one by calling the `add` method of the collection associated with a key. This duplication strikes me as flaw in the abstraction: the `MultiDictionary` is probably implemented using a `Dictionary` and some kind of collection type, such as a `List` (I imagine `Dictionary<TKey, List<TValue>>`). It could just as easily, although perhaps not as efficiently, have been implemented as a `List` of key-value pairs (`List<KeyValuePair<TKey,TValue>>`). Would the author still have exposed `ICollection<TValue>`? Probably not. This strikes me as a leaky abstraction in some ways, and I would probably have avoided it. In my mind, exposing `IEnumerable<TValue>` is a better choice in this particular case. The neediness would actually simplify the interface, and make it easier to reason about.
Perhaps this is about not saying more than you mean. You define upfront what you mean by a `MultiDictionary`, and then you describe exactly that in the interface, before implementing it in code. If you mean it to be a set of collections, then `ICollection` is great. If you mean it to be a `Dictionary` which happens to support duplicate keys, then `ICollection` seems to say too much.
For the remainder of the post I will use the term `IEnumerable` to mean `IEnumerable<T>`, and the same for `ICollection` ↩
“Needy” = produces very general types but consumes very specific types ↩
“Generous” = produces very specific types but consumes very general types ↩
Actually, I could argue that since `ICollection` is mutable and accepts data in both directions, the exposing of `ICollection` makes it a consumer in a way that `IEnumerable` wouldn’t, but I won’t go into that ↩
Perhaps a better word here would be stingy ↩
One Reply to “Needy Code”
I liked the pattern, Options- Pros and Cons.