An interesting list structure, part 2

Last time on FAIC I gave a standard implementation of a persistent immutable linked list, with cheap — O(1) — Push, Pop, Peek and IsEmpty operations, and expensive — O(n) — Append and Concatenate operations. In this episode we’re going to implement a Hughes list, which “flips” those costs: Pop, Peek and IsEmpty become O(n), Append and Concatenate become O(1). Push stays O(1) also. This data structure is useful for scenarios where we are accumulating a large list “from both ends” or out of smaller lists, and then want to read the whole thing out as a normal list once we’re done the accumulation.

Before we get into it, I’ll repeat my caveat from last time: the implementations I’m showing today are not intended to be production-ready, heavily tested, performant or in any way bulletproof; the point of this exercise is to learn about the data structure, not to give a tutorial on good industrial design.

Last time I gave the data structure of a Hughes list, which is surprisingly simple:

public struct HughesList<T>
{
private readonly Func<SimpleList<T>, SimpleList<T>> f;

Even if it is not at all clear to you how this is going to work — and it certainly was not clear to me when I learned about these! — this probably reminds you of your experience with LINQ. Constructing a query object with LINQ is very fast, even if the data set is extremely large, because we use the Func and IEnumerable types to defer computations to the future. A Hughes list gets its good performance in the same way; it represents a sequence of operations on simple lists to be executed in the future, when they can be done in the highest-performance order, once.

But I am getting ahead of myself. What is that function? What are its semantics? They are both simple and subtle:

A Hughes list represents a list. Say, 1, 2, 3, 4.
The function is a function which can concatenate a simple list to the represented list; the returned value is the resulting simple list.

So if we have the Hughes list h representing 1, 2, 3, 4, and we call h.f(g) where g is the simple list representing 5, 6, 7, 8, then the result will be a simple list representing 1, 2, 3, 4, 5, 6, 7, 8.

Let’s see where this takes us. The first thing I’m going to do is make a few little one-liner helpers to decrease the size of the code later. (In fact, every method I write is going to be a one-expression method!)

private HughesList(Func<SimpleList<T>, SimpleList<T>> f) =>
this.f = f;
private static HughesList<T> Make(
Func<SimpleList<T>, SimpleList<T>> f) =>
new HughesList<T>(f);

Nothing interesting there. The first interesting question is: what does an empty Hughes list look like? Well, remember our basic principle: if we have the Hughes list that represents an empty list, then the function must be the function that concatenates any list to the empty list. But… that’s the identity function! Appending any list to the empty list is just the list. So this one is easy:

public static readonly HughesList<T> Empty = Make(ts => ts);

We should expect that the empty list case is pretty straightforward; let’s get more complicated. How do we represent a Hughes list that contains a single element? That is, I want a function that takes a single T and gets back a Hughes list which represents the list containing only that value. Let’s suppose the value is 10. Our representation is a function which takes a list, say, 1, 2, 3, and returns that list concatenated to the list that contains only 10, which is 10, 1, 2, 3.

But… that’s just Push!

public static HughesList<T> Single(T t) => Make(ts => ts.Push(t));

Let’s make one more constructor. Suppose we have a simple list and we would like the equivalent Hughes list. Again, the Hughes list that represents a list is the function that concatenates something to that list. But… that’s Concatenate! We don’t even need a lambda for this one; we can turn the method directly into a delegate bound to the receiver.

public static HughesList<T> FromSimple(SimpleList<T> ts) =>
Make(ts.Concatenate);

So far every operation has been O(1) and has been a factory method. Now let’s look at our first O(n) operation, an instance method: how do we turn an instance of a Hughes list back into a simple list?

Well, what have we got? A Hughes list represents a list. It has a function which takes any simple list and returns the simple list that consists of the Hughes list concatenated with the simple list. The empty simple list is a legal argument, and it is the identity of concatenation, so we can simply write:

public SimpleList<T> ToSimple() => this.f(SimpleList<T>.Empty);

This is like doing a foreach over an OrderBy query in C#; the whole thing gets built at this point, and all the deferred work executes. As we’ll see, that work is O(n) in the size of the represented list.

Let’s blow right through the other O(n) operations. Again, the point of a Hughes list is that we want to decrease and defer the cost of Push, Append and Concatenate, making those operations O(1), but the down side is that we make Pop, IsEmpty and Peek all O(n). There is no way to tell from a function that can concatenate a list what the contents of the represented list are, so we’ll have to realize them as simple lists:

public T Peek => this.ToSimple().Peek;
public bool IsEmpty => this.ToSimple().IsEmpty;
public HughesList<T> Pop => FromSimple(this.ToSimple().Pop);
public override string ToString() => this.ToSimple().ToString();

(Exercise: We could make IsEmpty O(1); do you see how?)

(Exercise: I could have made Pop O(1). Can you do so? Why might this be a bad idea?)

All right, that leaves Push, Append and Concatenate. We will take each in turn.

Suppose we have the Hughes list for 1, 2, 3, and we want to push 10.

What have we got? We’ve got a function that takes a list, say, 4, 5, 6, and concatenates it to 1, 2, 3, to produce 1, 2, 3, 4, 5, 6.

What do we want? We want a function that takes a list, say, 4, 5, 6, and concatenates it with 10, 1, 2, 3, to produce 10, 1, 2, 3, 4, 5, 6.

That is the function “call the original concatenation function and then push 10 onto the result”:

private static HughesList<T> Push(HughesList<T> ts1, T t) =>
Make(ts2 => ts1.f(ts2).Push(t));
public HughesList<T> Push(T t) => Push(this, t);

Why do I have this little indirection into a static method? Because this in a value type is logically a ref parameter and cannot be captured in a lambda, but a formal parameter is just a normal local variable. I wish C# had a way to say “treat this as an ordinary value, not a ref, in this lambda”.

But that’s not the interesting part; make sure the meaning of this is clear. When we turn this thing back into a simple list, we’re going to pass the empty list to the new function. That’s going to pass the empty list to the previous function, which will realize the old list. We’ll then push a value onto the old list.

Calling Push on the Hughes list is O(1), because all we’re doing here is allocating a delegate. What cost are we deferring here? Suppose the old list is of length n-1 and the new list is of length n; calling ToSimple on this thing will call the underlying concatenation to construct the tail, which takes n-1 steps, and then adds a single O(1) simple-list Push to that, for a total cost of O(n).

One of the nice things about Hughes lists is that the Append operation is almost exactly the same as the Push operation, in sharp contrast to the SimpleList implementation where they could not be more different.

private static HughesList<T> Append(HughesList<T> ts1, T t) =>
Make(ts2 => ts1.f(ts2.Push(t)));
public HughesList<T> Append(T t) => Append(this, t);

Again, let’s talk through this. Suppose we have a Hughes list that represents 1, 2, 3 and we append 10. We want the Hughes list that represents 1, 2, 3, 10.

What do we have? A function that can take 4, 5, 6 and produce 1, 2, 3, 4, 5, 6.

What do we want? A function that can take 4, 5, 6 and produce 1, 2, 3, 10, 4, 5, 6.

So what do we do? We make a function that pushes 10 onto 4, 5, 6, and we know how to concatenate that to 1, 2, 3, which gives us the desired result!

Again, the Hughes list Append is obviously O(1) because it is just a delegate allocation. Suppose the original list is of length n-1. The call to ToSimple will do the O(1) Push onto the empty simple list, and then the n-1 steps to concatenate 10, 4, 5, 6 to 1, 2, 3, for a total cost of O(n).

We’ve only got one more, and I’m sure you can guess how it goes:

private static HughesList<T> Concatenate(
  HughesList<T> ts1, HughesList<T> ts2) =>
  Make(ts3 => ts1.f(ts2.f(ts3)));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
  Concatenate(this, ts2);

Going through that to verify that it does what it is supposed to is left as an exercise, as is the exercise to determine that doing a ToSimple on the result is O(n) in the size of the this list.

I said last time that it seems bizarre that there is no “list structure” here — no head and tail — but of course there is, it’s just encoded. This becomes more clear when you realize that our three operations, Push, Append, and Concatenate, are all the composition of f with another function. (Recall that the “composition” of two functions f1 and f2 is the function “call f2, and then call f1 with whatever f2 returns”.) Let’s rewrite our final three methods in that form, and we’ll notice something.

First we’ll make a helper method to do the composition for us:

private static HughesList<T> Make2(
Func<SimpleList<T>, SimpleList<T>> f1,
Func<SimpleList<T>, SimpleList<T>> f2) =>
new HughesList<T>(a => f1(f2(a)));

And now we can rewrite Push, Append and Concatenate to use this helper. When we do that we no longer have any lambdas that close over “this”, so we can get rid of our static helper methods too!

public HughesList<T> Push(T t) =>
Make2(ts2 => ts2.Push(t), this.f);
public HughesList<T> Append(T t) =>
Make2(this.f, ts2 => ts2.Push(t));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
Make2(this.f, ts2.f);

And now it becomes considerably more clear what is going on here: if you squint a little, it looks like Make2 has the same structure as though it was a constructor of a binary tree class! Everything in the left argument is going on the left side of the list, and everything in the right argument is going in the right half! That’s why we can get away with such a “thin” data structure in a Hughes list; the real data structures are in the closures generated by the compiler!

Or, another way to look at it is: when we call ToSimple, all the operations in the right argument will be done first, and its result will be the argument to the function in the left side; in this manner we will build up the resulting list from right to left, which is the efficient way to do it.

Next time on FAIC: A relevant story about my Microsoft full-time interviews.

11 thoughts on “An interesting list structure, part 2”

Peter on January 24, 2019 at 1:41 pm said:

This actually reminds me a lot of LINQ…

Peek => Enumerable.First
IsEmpty => !Enumerable.Any
Pop => Enumerable.Skip
Push => Enumerable.Prepend
Append => Enumerable.Append
Concatenate => Enumerable.Concat

Peek, IsEmpty, and Pop are quite efficient (as long as you don’t Pop too many times).

Reply ↓
- ericlippert on January 24, 2019 at 2:15 pm said:
  
  Absolutely yes; I alluded to that briefly but your chart makes it much more clear how similar they are.
  
  Reply ↓
David Cuccia on January 24, 2019 at 8:31 pm said:

Exercise 1: IsEmpty => f == Make(ts => ts);

Reply ↓
- Yair Halberstadt on January 24, 2019 at 11:59 pm said:
  
  You will also have to do a check in concatenate and FromSimple that the passed in Lists are not empty, and if they are, return an EmptyList. You should also make Empty a singleton property on the class.
  
  Reply ↓
Yair Halberstadt on January 25, 2019 at 12:04 am said:

Exercise 2: private static HughesList Pop(HughesList ts1, T t) =>
Make(ts2 => ts1.f(ts2).Pop(t));

Pop can throw an exception, and deferring it might delay this exception. It also means that you can’t return the popped value at the same time, which is what most pop implementations do.

Reply ↓
- ericlippert on January 25, 2019 at 6:26 am said:
  
  Those are some good reasons. Also, your proposed implementation of IsEmpty stops working! There’s one more reason that comes to mind; it has to do with performance.
  
  Reply ↓
Pingback: Hughes List – Coder Mike
Michael Hunter on January 25, 2019 at 12:42 am said:

For fun, I tried to implement this same idea on top of an imperative/mutable list structure (JavaScript is my weapon of choice), while still keeping the list itself immutable. If you’re interested, http://coder-mike.com/2019/01/hughes-list/

Reply ↓
Pingback: Dew Drop – January 25, 2019 (#2885) | Morning Dew
cpitclaudel on January 25, 2019 at 7:32 am said:

The two exercises about O(1) IsEmpty and Pop are quite puzzling 🙂 I’m assuming that since these exercises come before the definition of Concat, they don’t depend on it.

…but I’m not sure how to make either IsEmpty or Pop O(1) without changing the other methods. My intuition is that, since you’re essentially (in the case of Pop) trying to find the leftmost node of a binary tree, the cost of the operation will depend on the shape of the tree. Since your implementation doesn’t do any rebalancing in Concat, it seems surprising that you might be able to get an O(1) Pop with no changes to Concat.

Of course, allowing modifications to Concat makes the problem easier; for example, to make IsEmpty O(1), you change Concat to check both of its arguments for emptyness and skip the lambda allocation if one is empty. This ensures that there is only one (canonical) empty list, making it possible to implement IsEmpty as a pointer-equality check.

Reply ↓
- ericlippert on January 28, 2019 at 9:44 am said:
  
  Right, you can make the empty Hughes list a singleton and then it is just a reference comparison.
  
  You can make Pop O(1) by making a function that defers the Pop operation until later, but there are two main reasons to not do that. The first one is: now IsEmpty is no longer a reference comparison! If you start with an empty list, then push, then pop, you have an empty list again but it is a different reference. The second reason is: we have a nice property that turning a Hughes list into a simple list is O(n) where n is the size of the list, but if you defer pop operations, then the cost of turning the Hughes list into a simple list is O(n) in the total number of push and pop operations, not in the final size of the list.
  
  Reply ↓