Fixing Random, part 1

[Code is here.]

How to say this delicately?

I really, really dislike System.Random. Like, with a passion. It is so… awful.

The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way. And then System.Random comes along; I cringe every time I see code on StackOverflow that uses it, because it is almost always wrong, and it is seldom easy to see how to make it right.

Let’s start with the obvious problem: the natural way to use it is also the wrong way. The naive developer thinks “I need a new random dice roll, so…”

void M()
  Random r = new Random();
  int x = r.Next(1, 6);

Two lines in and we are already in a world of pain.

First, every time M() is called, we create a new Random, but in most of the implementations of Random available for the last couple decades, by default it seeds itself with the current time, and the granularity of the timer is only a few milliseconds. Computers run on the nanosecond scale, and so the likelihood that we’ll get two Randoms with the same seed is very high, and therefore it is very likely that successive calls to M() produce runs of the same number. You never want to make a new Random if you can avoid it; you want to make one once, and then re-use it. But it’s a bad idea to stash it away in a static, because it’s not thread safe!

Fortunately, I have just learned from an attentive reader that this problem has been fixed in some versions of the CLR; exactly which I am not sure. In those versions, a new Random() now seeds itself randomly, rather than based on the current time. Thanks to the BCL team for fixing this irksome problem; better late than never.

Second, the arguments to Next are the minimum value produced, inclusive, and the maximum value produced, exclusive! This program produces random numbers drawn from 1, 2, 3, 4, 5. The correct way to get numbers from 1 to 6 is of course Next(1, 7).

What we’ve got here is a classic example of Steve Maguire’s “candy machine interface” concept. The candy machines at Microsoft used to have an item number — say, 75, and a price, say, $0.85, you put in your quarters, punch in the item number and, dammit, wait, that’s item 085 and it costs $0.75, I got it backwards, and now I’ve got some weird gum instead of those jalapeno chips. This is a true story! I have actually made that mistake in Microsoft candy machines.

But those are trivial problems. The really fundamental problem here is that we’re working at too low a level of abstraction. It is not the 1970s anymore, when rand() was good enough. We have sophisticated problems in statistical modeling and the attendant probabilistic reasoning to solve in modern programming languages. We need to up our game by writing code in the “business domain” of probabilities, not the “mechanism domain” of calls to methods that return random numbers.

Coming up on FAIC: We will start by simply improving the existing implementation of Random, but from that humble beginning we’ll develop a new class library that makes programming with probabilities much more readable, powerful and efficient in C#.

An interesting list structure, part 3

This is the story of how I was no-hired by Microsoft; surprisingly, it is germane to our discussion of unusual list structures.

I was a Waterloo co-op intern three times at Microsoft, each time working for the Visual Basic team; that’s where I learned how industrial developer tools work, how to build a localizable product, and many other valuable lessons. The VB team told me that on the strength of my internships — which, after all, was in total a 12 month long interview spread over two years — that they’d give me a job when I graduated. However, the VB managers were both smart and kind, and encouraged me to look around the company more broadly to see if there was any other team that I might be a good fit for.

I ended up interviewing with two or maybe three other teams; this was over 20 years ago and my memory is a bit hazy. A few points do stand out though. I had an interview with the ill-fated Blackbird team that I thought went pretty well — the technical question was about rapidly finding the seven letter words in a Scrabble rack, and I gave the standard solution of preprocessing the dictionary into a multi-dictionary that answers the question in O(1) time, and then discussed how to use a trie to solve the more general problem. I’m not sure why they no-hired me, but that team was about to be disbanded so I dodged a bullet there. In retrospect, this was over a year after the “tidal wave email” and I am surprised they lasted as long as they did.

But the thing that really stands out is my interview with the SQL Server team. The day I interviewed there was a snowstorm in the greater Seattle area, which is quite rare and there is not infrastructure to rapidly clear the streets. A grand total of zero of the people scheduled to interview me came into work that day, and so recruiting had to scramble to find interviewers. The technical question I got from the SQL team was “reverse a linked list in C++” — that was the extent of the problem statement, so I asked “Can I assume I already have a linked list class defined?”  Sure. “Is it acceptable if the original list is destroyed?” Sure. “OK,” I said, and wrote on the whiteboard something like:

LinkedList reverse(LinkedList list) 
  LinkedList result;
  while (!list->is_empty()) result.push(list.pop());
  return result;

I’m probably missing some syntactic details there; I haven’t programmed in C++ in fifteen years. But you get the idea; reversing a linked list is logically just popping the old list and pushing onto the new list.

The interviewer just said something like “that’s wrong”, and I said “no, that’s right but would you like me to implement push and pop to show you how they work?” and it went downhill from there.

That was super confusing to me at the time. I realized much later — like, years later, when I started doing interviews myself — that the interviewer was unprepared, was not expecting to do an interview that day, may have been a very inexperienced interviewer, probably picked a problem at random, and worst of all had a specific “correct” solution in mind. Almost certainly the “correct” solution was the solution where you traverse the list and rejigger the “next” pointer “in place” as you go, which of course is how you reverse a linked list in C. (I am often surprised at the number of people in interviews who write C-style solutions in C++ or C#, as though those languages did not have a culture of encapsulation of implementation details.)

So it was a confusing day and a bunch of no-hires, but no harm done. Obviously I ended up having a successful 16 year stint at Microsoft regardless, and I learned from example how not to conduct a technical interview.

All this was on my mind as I was writing this series because the original paper by Hughes that introduced the concept I’ve been discussing is called A novel representation of lists and its application to the function “reverse”.

What on Earth does “reverse a linked list” have to do with this unusual representation of linked lists? I won’t go through the whole argument presented in the Hughes paper because it gets a bit technical, but I’ll give you what was for me the key insight.

Suppose we have our “simple” linked list from last time, where a list is either empty, or a head value and a tail list. How do you reverse that linked list? The problem here is posed in the context of functional languages where we do not have mutable local variables and loops, so a recursive solution is required.

If we have an “append” operation available then the naive recursive way to reverse a simple list is to say “recursively reverse the tail, and then append the old head to the reversed tail”. That is, if we have 1, 2, 3, then we recursively reverse 2, 3 into 3, 2, and then append 1 to get 3, 2, 1. In our implementation that would be:

public SimpleList<T> NaiveReverse() =>
  this.IsEmpty ?
    this :

The problem with that should be clear given our previous discussion. We recurse once per list element, so we recurse O(n) times. But Append is an O(n) operation on simple lists, so the total complexity is O(n2), which is pretty bad.

The standard O(n) solution is to emulate my little loop above, but non-destructively and recursively:

private SimpleList<T> ReverseWithTail(SimpleList<T> tail) =>
  this.IsEmpty ?
    tail :
public SimpleList<T> Reverse() => this.ReverseWithTail(Empty);

But… hold on a minute here. If we said

Func<SimpleList<T>, SimpleList<T>> f = someList.ReverseWithTail;

Then what we’ve got is an object that represents a list — in this case, it represents the list that is someList reversed — where the object is a function such that when you give it a list, it gives you back the represented list concatenated with the given list, in O(n) time; a delegate to ReverseWithTail is exactly the data stored in a Hughes list!

So perhaps it is not that bizarre to represent a list as a concatenation function. Functional programs are chock-full of algorithms like this one, where the “tail end” of a list is being passed in and the algorithm represents the operation of stuffing values onto the left side of it. The genius of the Hughes list is to recognize that this technique can be generalized into an elegant, useful data type that defers expensive work until it is needed.

Next time on FAIC: System.Random is the source of so many bugs; could we come up with a better design? This is going to be a long series, because there’s a lot of improvements we could make. We’re not yet programming at the correct level of abstraction when it comes to dealing with randomness; let’s see how we can fix that.


An interesting list structure, part 2

Last time on FAIC I gave a standard implementation of a persistent immutable linked list, with cheap — O(1) — Push, Pop, Peek and IsEmpty operations, and expensive — O(n) — Append and Concatenate operations. In this episode we’re going to implement a Hughes list, which “flips” those costs: Pop, Peek and IsEmpty become O(n), Append and Concatenate become O(1). Push stays O(1) also. This data structure is useful for scenarios where we are accumulating a large list “from both ends” or out of smaller lists, and then want to read the whole thing out as a normal list once we’re done the accumulation.

Before we get into it, I’ll repeat my caveat from last time: the implementations I’m showing today are not intended to be production-ready, heavily tested, performant or in any way bulletproof; the point of this exercise is to learn about the data structure, not to give a tutorial on good industrial design.

Last time I gave the data structure of a Hughes list, which is surprisingly simple:

public struct HughesList<T>
    private readonly Func<SimpleList<T>, SimpleList<T>> f;

Even if it is not at all clear to you how this is going to work — and it certainly was not clear to me when I learned about these! — this probably reminds you of your experience with LINQ.  Constructing a query object with LINQ is very fast, even if the data set is extremely large, because we use the Func and IEnumerable types to defer computations to the future. A Hughes list gets its good performance in the same way; it represents a sequence of operations on simple lists to be executed in the future, when they can be done in the highest-performance order, once.

But I am getting ahead of myself. What is that function? What are its semantics? They are both simple and subtle:

  • A Hughes list represents a list. Say, 1, 2, 3, 4.
  • The function is a function which can concatenate a simple list to the represented list; the returned value is the resulting simple list.

So if we have the Hughes list h representing 1, 2, 3, 4, and we call h.f(g) where g is the simple list representing 5, 6, 7, 8, then the result will be a simple list representing 1, 2, 3, 4, 5, 6, 7, 8.

Let’s see where this takes us.  The first thing I’m going to do is make a few little one-liner helpers to decrease the size of the code later. (In fact, every method I write is going to be a one-expression method!)

private HughesList(Func<SimpleList<T>, SimpleList<T>> f) =>
  this.f = f;
private static HughesList<T> Make(
  Func<SimpleList<T>, SimpleList<T>> f) =>
  new HughesList<T>(f);

Nothing interesting there. The first interesting question is: what does an empty Hughes list look like? Well, remember our basic principle: if we have the Hughes list that represents an empty list, then the function must be the function that concatenates any list to the empty list. But… that’s the identity function! Appending any list to the empty list is just the list. So this one is easy:

public static readonly HughesList<T> Empty Make(ts => ts);

We should expect that the empty list case is pretty straightforward; let’s get more complicated. How do we represent a Hughes list that contains a single element? That is, I want a function that takes a single T and gets back a Hughes list which represents the list containing only that value. Let’s suppose the value is 10. Our representation is a function which takes a list, say, 1, 2, 3, and returns that list concatenated to the list that contains only 10, which is 10, 1, 2, 3.

But… that’s just Push!

public static HughesList<T> Single(T t) => Make(ts => ts.Push(t));

Let’s make one more constructor. Suppose we have a simple list and we would like the equivalent Hughes list. Again, the Hughes list that represents a list is the function that concatenates something to that list. But… that’s Concatenate! We don’t even need a lambda for this one; we can turn the method directly into a delegate bound to the receiver.

public static HughesList<T> FromSimple(SimpleList<T> ts) =>

So far every operation has been O(1) and has been a factory method. Now let’s look at our first O(n) operation, an instance method: how do we turn an instance of a Hughes list back into a simple list?

Well, what have we got? A Hughes list represents a list. It has a function which takes any simple list and returns the simple list that consists of the Hughes list concatenated with the simple list.  The empty simple list is a legal argument, and it is the identity of concatenation, so we can simply write:

public SimpleList<T> ToSimple() => this.f(SimpleList<T>.Empty);

This is like doing a foreach over an OrderBy query in C#; the whole thing gets built at this point, and all the deferred work executes. As we’ll see, that work is O(n) in the size of the represented list.

Let’s blow right through the other O(n) operations. Again, the point of a Hughes list is that we want to decrease and defer the cost of Push, Append and Concatenate, making those operations O(1), but the down side is that we make Pop, IsEmpty and Peek all O(n). There is no way to tell from a function that can concatenate a list what the contents of the represented list are, so we’ll have to realize them as simple lists:

public T Peek => this.ToSimple().Peek;
public bool IsEmpty => this.ToSimple().IsEmpty;
public HughesList<T> Pop => FromSimple(this.ToSimple().Pop);
public override string ToString() => this.ToSimple().ToString();

(Exercise: We could make IsEmpty O(1); do you see how?)

(Exercise: I could have made Pop O(1). Can you do so? Why might this be a bad idea?)

All right, that leaves Push, Append and Concatenate. We will take each in turn.

Suppose we have the Hughes list for 1, 2, 3, and we want to push 10.

What have we got? We’ve got a function that takes a list, say, 4, 5, 6, and concatenates it to 1, 2, 3, to produce 1, 2, 3, 4, 5, 6.

What do we want? We want a function that takes a list, say, 4, 5, 6, and concatenates it with 10, 1, 2, 3, to produce 10, 1, 2, 3, 4, 5, 6.

That is the function “call the original concatenation function and then push 10 onto the result”:

private static HughesList<T> Push(HughesList<T> ts1, T t) =>
  Make(ts2 => ts1.f(ts2).Push(t));
public HughesList<T> Push(T t) => Push(this, t);

Why do I have this little indirection into a static method? Because this in a value type is logically a ref parameter and cannot be captured in a lambda, but a formal parameter is just a normal local variable. I wish C# had a way to say “treat this as an ordinary value, not a ref, in this lambda”.

But that’s not the interesting part; make sure the meaning of this is clear. When we turn this thing back into a simple list, we’re going to pass the empty list to the new function. That’s going to pass the empty list to the previous function, which will realize the old list. We’ll then push a value onto the old list.

Calling Push on the Hughes list is O(1), because all we’re doing here is allocating a delegate. What cost are we deferring here? Suppose the old list is of length n-1 and the new list is of length n; calling ToSimple on this thing will call the underlying concatenation to construct the tail, which takes n-1 steps, and then adds a single O(1) simple-list Push to that, for a total cost of O(n).

One of the nice things about Hughes lists is that the Append operation is almost exactly the same as the Push operation, in sharp contrast to the SimpleList implementation where they could not be more different.

private static HughesList<T> Append(HughesList<T> ts1, T t) =>
  Make(ts2 => ts1.f(ts2.Push(t)));
public HughesList<T> Append(T t) => Append(this, t);

Again, let’s talk through this. Suppose we have a Hughes list that represents 1, 2, 3 and we append 10. We want the Hughes list that represents 1, 2, 3, 10.

What do we have? A function that can take 4, 5, 6 and produce 1, 2, 3, 4, 5, 6.

What do we want? A function that can take 4, 5, 6 and produce 1, 2, 3, 10, 4, 5, 6.

So what do we do? We make a function that pushes 10 onto 4, 5, 6, and we know how to concatenate that to 1, 2, 3, which gives us the desired result!

Again, the Hughes list Append is obviously O(1) because it is just a delegate allocation. Suppose the original list is of length n-1. The call to ToSimple will do the O(1) Push onto the empty simple list, and then the n-1 steps to concatenate 10, 4, 5, 6 to 1, 2, 3, for a total cost of O(n).

We’ve only got one more, and I’m sure you can guess how it goes:

private static HughesList<T> Concatenate(
  HughesList<T> ts1, HughesList<T> ts2) =>
  Make(ts3 => ts1.f(ts2.f(ts3)));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
  Concatenate(this, ts2);

Going through that to verify that it does what it is supposed to is left as an exercise, as is the exercise to determine that doing a ToSimple on the result is O(n) in the size of the this list.

I said last time that it seems bizarre that there is no “list structure” here — no head and tail — but of course there is, it’s just encoded. This becomes more clear when you realize that our three operations, Push, Append, and Concatenate, are all the composition of f with another function. (Recall that the “composition” of two functions f1 and f2 is the function “call f2, and then call f1 with whatever f2 returns”.) Let’s rewrite our final three methods in that form, and we’ll notice something.

First we’ll make a helper method to do the composition for us:

private static HughesList<T> Make2(
  Func<SimpleList<T>, SimpleList<T>> f1,
  Func<SimpleList<T>, SimpleList<T>> f2) =>
  new HughesList<T>(a => f1(f2(a)));

And now we can rewrite Push, Append and Concatenate to use this helper. When we do that we no longer have any lambdas that close over “this”, so we can get rid of our static helper methods too!

public HughesList<T> Push(T t) =>
  Make2(ts2 => ts2.Push(t), this.f);
public HughesList<T> Append(T t) =>
  Make2(this.f, ts2 => ts2.Push(t));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
  Make2(this.f, ts2.f);

And now it becomes considerably more clear what is going on here: if you squint a little, it looks like Make2 has the same structure as though it was a constructor of a binary tree class! Everything in the left argument is going on the left side of the list, and everything in the right argument is going in the right half! That’s why we can get away with such a “thin” data structure in a Hughes list; the real data structures are in the closures generated by the compiler!

Or, another way to look at it is: when we call ToSimple, all the operations in the right argument will be done first, and its result will be the argument to the function in the left side; in this manner we will build up the resulting list from right to left, which is the efficient way to do it.

Next time on FAIC: A relevant story about my Microsoft full-time interviews.

An interesting list structure, part 1

As long-time readers of my blog know, I’m a big fan of functional persistent immutable list data structures. I recently learned about a somewhat bizarre but very practical implementation of such lists, and naturally I wanted to implement it myself to try it out. The list data structure I’m going to discuss today is called a “Hughes list”, after its creator John Muir Hughes. The Haskell implementation is called “DList”, which stands for “difference list”, and this data structure is also commonly used in Prolog and Scala.

Let’s start by revisiting the standard immutable persistent linked list, which takes the form of a stack. As you probably recall, by “immutable” we mean that the operations like Push and Pop on the list do not change the list; it is immutable. Rather, they return a new list that has the desired content. Since lists are immutable, their contents can be re-used in new lists without worrying that a mutation will wreck things for the code that is using the new version; this is what is meant by “persistent” in this context.

In this implementation I’m going to implement the usual push, pop, peek and “is empty?” operations, but I’m going to add two more: “append”, which adds a single item to the list at the “wrong” end, and “concatenate”, which concatenates two lists together.

That is, if we have a list 1, 2, 3 and we push 4, the result is 4, 1, 2, 3.  If we have a list 1, 2, 3 and we append 4, we have the list 1, 2, 3, 4.  And if we have list 1, 2, 3, and we concatenate list 4, 5, 6, the result is 1, 2, 3, 4, 5, 6.  All clear?

Essentially I’m implementing catenable deques here, but as we’ll see, they are very inefficient in this implementation.

I’m also going to implement two static list constructors: one that makes an empty list, and one that makes a single-item list.

Oh, and before we begin: I’m going to use the standard simple recursive implementations of these operations. In practice, these would blow the stack in C# if the lists got big, because C# is not consistently tail-recursive. Were I designing these types for industrial use, I’d write iterative algorithms instead of recursive algorithms, and I’d also follow good practices like implementing IEnumerable and so on. These types are not intended for use in production; they are for pedagogic purposes only.

public abstract class SimpleList<T>
  public static readonly SimpleList<T> Empty =
    new EmptyList();
  public static SimpleList<T> Single(T t) =>
  private SimpleList() { }
  public abstract bool IsEmpty { get; }
  public abstract T Peek { get; }
  // If this is a, b, c, then this.Pop is b, c
  public abstract SimpleList<T> Pop { get; }
  // If this is a, b, c and s is d, e, f, then
  // this.Concatenate(s) is a, b, c, d, e, f
  public abstract SimpleList<T> Concatenate(SimpleList<T> s);
  // If this is b, c, and t is a, then this.Push(t) is a, b, c
  public SimpleList<T> Push(T t) =>
    new NonEmptyList(t, this);
  // If this is b, c, and t is a, then this.Append(t) is b, c, a

  public SimpleList<T> Append(T t) =>
  private sealed class EmptyList : SimpleList<T>
    public override bool IsEmpty => true;
    public override T Peek =>
      throw new InvalidOperationException();
    public override SimpleList<T> Pop =>
      throw new InvalidOperationException();
    public override SimpleList<T> Concatenate(
      SimpleList<T> s) => s;
public override string ToString() => “”;
  private sealed class NonEmptyList : SimpleList<T>
    private readonly T head;
    private readonly SimpleList<T> tail;
    public NonEmptyList(T head, SimpleList<T> tail)
      this.head = head;
      this.tail = tail;
    public override bool IsEmpty => false;
    public override T Peek => head;
    public override SimpleList<T> Pop => tail;
    public override SimpleList<T> Concatenate(
        SimpleList<T> s) =>
      s.IsEmpty ? this : this.Pop.Concatenate(s).Push(this.Peek);
    public override string ToString() => 
      this.Pop.IsEmpty ?

All right, hopefully that was pretty straightforward. Empty, Single, IsEmpty, Peek, Push and Pop are all O(1). Concatenate is O(n) in the size of “this”, except for the special case of concatenating an empty list, which is O(1) regardless of the size of “this”. Append is implemented as Concatenate, so it is O(n).

The only algorithm which is even slightly tricky is Concatenate, but hopefully you see how it works. Basically, we’re using recursion to reverse “this” and then push each element of the reversed list onto s. As I noted above, a more efficient and safe implementation would be to actually reverse the list in a loop rather than using recursion to put that information onto the call stack, but whatever, this is for pedagogy — however, we will return to a related issue in a future episode, so keep it in mind!

The problem here is that we are often in a situation where Append and Concatenate are frequent operations, but Peek and Pop are not. For example, we might be in a “list building” scenario where we are composing a large list out of many smaller lists, and wish to be able to efficiently concatenate them.

I already did a series on using finger trees to implement an efficient catenable deque, but finger trees are a quite complex and difficult data structure to understand; you may recall that I did not even implement the efficient “break up/concatenate” operations because they were just too much work. The great thing about Hughes lists is that they are a seemingly very simple data structure that efficiently implements list concatenation. In fact, here it is:

public struct HughesList<T>
  private readonly Func<SimpleList<T>, SimpleList<T>> f;


And… that’s it. A Hughes list is a struct that wraps a single delegate; the delegate takes a simple list and returns a simple list. Of course I have left out all the methods and properties, but the delegate is the only data that is stored in a Hughes list.

I know what you’re thinking, because it’s the same thing that I thought when I first heard about this: that sounds completely bonkers. How can a function from lists to lists be a representation of a list? Where’s the head and the tail and all that stuff?

Next time on FAIC: We’ll find out how it works and build an implementation in less than two dozen lines of code.

Indexer error cases

Pop quiz: Consider these little methods. (They could be static, inline, whatever, that doesn’t matter.)

int[] A1()
  return new int[10];
int[] A2()
  return null;
int B1() 
  return 1;
int B2()
  return 11;
int C()
  return 123;

Now I have four questions for you; can you predict the behaviour of each of these program fragments?

try { A1()[B1()] = C(); }
catch { Console.WriteLine(“catch”); }

 { A1()[B2()] = C(); }
catch { Console.WriteLine(“catch”); }

 { A2()[B1()] = C(); }
catch { Console.WriteLine(“catch”); }

 { A2()[B2()] = C(); }
catch { Console.WriteLine(“catch”); }

There’s no trickery going on here; seriously, make a mental model of what each of these does and try to predict their behaviour. Scroll down when you’re done.






The first one prints A, B, C. All the rest print A, B, C, catch.

Back when I was a young and naive compiler developer, I would have predicted the final three produce “A, B, catch“, based on the following reasoning:

  • In C# an assignment produces all the side effects of the left-hand-side expression before all the side effects of the right-hand-side expression.
  • An exception is a side effect.
  • The exceptions produced here — either the null deference when A2() is called, or index out of bounds when A1()[B2()] is called — should therefore happen when the left hand side is evaluated, before C() is evaluated.

This reasoning is incorrect; in C# (and also in Java and many other languages), it is the assignment itself that triggers the exception, even when the exception is the result of an indexing operation.

That is, my wrong naive mental model was that A()[B()]=C() had the semantics of:

int[] a = A();
int b = B();
// Make an alias; crash if its bad
ref int r = ref a[b]; 
int c = C();
r = c;

This is wrong; the correct model is that this is equivalent to:

int[] a = A();
int b = B();
int c = C();
a[b] = c;

I must admit, when I learned this shortly after joining the C# team, I was surprised and I considered it a bug in the design. But there are good reasons for this, and it turns out to have some nice properties.

First off, let’s get the main point out of the way. You should never write a program like the one I wrote, where there is a difference in behaviour depending on when the exception is thrown. Those exceptions should never be thrown! A program should not dereference null or blow past the end of an array and then catch the problem; you shouldn’t cause the problem in the first place. So the choice of what to do here really should only matter to the compiler writer, not to the user. If you write your code correctly, it makes no difference when the exception is thrown because it will not actually be thrown.

So, given that it ought not to matter to the user, what is the language designer to do? Should the language designer say that “evaluate the left” causes the exception, or should the exception be triggered by the assignment, after the right has been evaluated?

One way to think about it is: what if we were calling a user-defined indexer setter instead of an array. Suppose we have a class X with an indexer:

public int this[int i]
  get { … whatever … }
    if (i < 0 || i > 10)
      throw new BlahBlahBlah();
    … whatever …

An indexer is just a fancy way to write a method call. Suppose we have:

X X1()
  return new X();

And now what does X1()[B()]=C(); mean? Plainly it must mean:

X x = X1();
int b = B();
int c = C();
x.indexer_setter(b, c);

And that last line has the semantics of “throw if x is null, otherwise do the call and throw if the index is out of bounds”.  In this scenario we definitely evaluate B() even if X1() returns null, and we definitely evaluate C() even if B() is out of bounds, so it makes sense that we should be consistent between user-defined and built-in indexers.

The decision to make the evaluation of an indexer defer all the checks until the assignment also helps in this scenario; think about how you would make this work if you were the compiler writer:

async Task<intD() {
  await Task.Delay(1000);
  return 123;

A()[B()] = await D();

At the point of the await, the current activation is going to spill all of its local state into a closure, assign the continuation of the current task to complete the assignment, and return control to the caller; when D() completes, the continuation is activated and the assignment happens. But what information is stored in the closure? If the left side is computed as

int[] a = A();
int b = B();
ref int r = ref a[b];

Then we can lose “a” and “b”; we never read them again. But we need to somehow store a ref in a closure, which is not legal in the CLR! There are ways to get around that, but boy are they painful and expensive. By deferring the validation of the collection and index until the assignment, we eliminate the need to spill a reference off the stack into a closures; we just store the ordinary values of “a” and “b”.

Of course, the design team for C# 1.0 did not know that the design team for C# 5.0 was going to need to implement temporary spilling into closures, but it surely is a happy accident that the way it does work is the way that is cheapest and easiest to implement should you need to put temporaries into a closure.

Next time on FAIC: I recently learned about an interesting way to represent immutable lists that has completely different performance characteristics than standard immutable lists. At first glance it seems impossible, but it all works out beautifully. We’ll do a short series on how to make standard linked lists into catenable deques without going to all the trouble of building a finger tree, like we did last time.

Dual numbers, part 4

Last time on FAIC I gave a hand-wavy intuitive explanation for why it is that lifting a function on doubles to duals gives us “automatic differentiation” on that function — provided of course that the original function is well-behaved to begin with. I’m not going to give a more formal proof of that fact; today rather I thought I’d do a brief post about multivariate calculus, which is not as hard as it sounds. (In sharp contrast to the next step, partial differential equations, which are much harder than they sound; I see that Professor Siv, as we called him, is still at UW and I hope is still cheerfully making undergraduates sweat.)

We understand that the derivative of an analytic function f(x) that takes a real and produces a real is the function f'(x) that gives the slope of f(x) at each point. But suppose we have a function that takes two reals and produces a real. Maybe f(x, y) = cos(x/5) sin(y/3). Here, I’ll graph that function from (0, 0) to (20, 20):

Screen Shot 2018-12-18 at 12.31.54 PM.png

We could implement this in C# easily enough:

static  Compute2( x,  y) => Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

That’s a pretty straightforward analytic function of two variables, but we’re programmers; we can think of two-variable functions as a one-variable factory that produces one-variable functions. (This technique is called currying, after the logician Haskell Curry.)

static Func<, > Compute2FromX( x) => ( y) =>
  Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

That is, given a value for x, we get back a function f(y), and now we are back in the world of single-variable real-to-real functions. We could then take the derivative of this thing with respect to y and find the slope “in the y direction” at a given (x, y) point.

Similarly we could go in the other direction:

static Func<, > Compute2FromY( y) => ( x) =>
  Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

And now given a “fixed” y coordinate we can find the corresponding function in x, and compute the slope “in the x direction”.

In wacky calculus notation, we’d say that ∂f(x, y)/∂x is the two-variable function that restricts f(x, y) to a specific y and computes the slope of that one-variable function in the x direction. Similarly ∂f(x, y)/∂y is the function that restricts f(x, y) to a specific x and computes the slope of that one-variable function in the y direction.

You probably see how we could compute the partial derivatives of a two-variable, real-valued analytic function using our Dual type; just curry the two-variable function into two one-variable functions, change all of the “doubles” to “Duals” in the “returned” function, and provide dualized implementations of sine and cosine.

But… do we actually need to do the currying step? It turns out no! We do need implementations of sine and cosine; as we discussed last time, they are:

public static  Cos( x) =>
  new (Math.Cos(x.r), Math.Sin(x.r) * x.ε);
public static  Sin( x) =>
  new (Math.Sin(x.r), Math.Cos(x.r) * x.ε);

Easy peasy. We can now write our computation in non-curried, dualized form:

static  Compute2( x,  y) => .Cos(x / 5.0) * .Sin(y / 3.0);

Now suppose we are interested in the x-direction slope and the y-direction slope at (13, 12), which is part of the way up the slope of the right hand “hill” in the graph:

Console.WriteLine(Compute2(13 + .EpsilonOne, 12));
Console.WriteLine(Compute2(13, 12 + .EpsilonOne));

We do the computation twice, and tell the compute engine to vary one-epsilon in the x direction the first time, and not at all in the y direction. The second time, we vary by one-epsilon in the y direction but not at all in the x direction. The output is:


Which gives us that the slope in the x direction is about 0.078, and the slope in the y direction is about 0.187. So for the cost of computing the function twice, we can get the derivatives in both directions. Moreover, if we wanted to compute the slope in some other direction than “parallel to the x axis” or “parallel to the y axis”, perhaps you can see how we’d do that.

But we’ve actually gotten something even a bit more interesting and useful here: we’ve computed the gradient!

The gradient of a multivariate function like this one is the direction you’d have to walk to gain altitude the fastest. That is, if you are standing at point (13, 12) on that graph, and you walk 0.078 steps in the x direction for every 0.187 steps in the y direction, you’ll be heading in the “steepest” direction, which is the direction likely to get you to the top of the mountain the fastest:


And of course if you walk in the “negative of the gradient” direction, you’re heading “downslope” as fast as possible.

Notice that heading in the “upslope” or “downslope” direction does not necessarily move you directly towards the nearest peak, valley or saddle; it heads you in the direction that is gaining or losing altitude the fastest. If you do not know exactly where the peaks and valleys are, that’s your best guess of the right direction to go.

I hope it is clear that there is no reason why this should be limited to functions of two variables, even though it is hard to visualize higher-dimensional functions! We can have a real-valued function of any number of variables, take the partial derivatives of each, and find the gradient that points us in the “steepest” direction.

Why is this useful? Well, imagine you have some very complicated but smooth multivariate function that represents, say, the cost of a particular operation which you would like to minimize. We might make an informed guess via some heuristic that a particular point is near a “cost valley”, and then compute the gradient at that point; this gives us a hint as to the direction of the nearest valley, and we can then try another, even more informed guess until we find a place where all the gradients are very small.  Or, equivalently, we might have a “benefit function” that we are trying to maximize, and we need a hint as to what direction the nearby peaks are. These techniques are in general called “gradient descent” and “gradient ascent” methods and they are quite useful in practice for finding local minima and maxima.

The point is: as long as the function you are computing is computed only by adding, subtracting, multiplying, and dividing doubles, or using double-valued library functions whose derivatives you know, you can “lift” the function to duals with only a minor code change, and then you get the derivative of that function computed exactly, for a very small additional cost per computation. That covers a lot of functions where it might not be at all obvious how to compute the derivative. Even better, for an n-dimensional function we can compute the gradient for only a cost of O(n) additional operations. (And in fact there are even more efficient ways to use dual numbers do this computation, but I’m not going to discuss them in this series.)

A great example of a high-dimensional computation that only uses real addition and multiplication is the evaluation of a function represented by a neural net in a machine learning application; by automatically differentiating a neural network during the training step, we can very efficiently tweak the neuron weights during the training step to seek a local maximum in the fitness of the network to the given task. But I think I will leave that discussion for another day.

Next time on FAIC: What exactly does A()[B()]=C(); do in C#? It might seem straightforward, but there are some subtle corner cases.

Dual numbers, part 3

Last time on FAIC we discovered that when you “lift” a polynomial on doubles to a polynomial on duals and then evaluate it with x+1ε, the “real” part of the returned dual exactly corresponds to the original polynomial at x; that’s not a surprise. What is bizarre and fascinating is that the “epsilon” part of the dual exactly corresponds to the derivative of the polynomial at x!

First question: is this actually true for all polynomials, or did I trick you by choosing an example that just happened to work? Let’s make a little chart. I’ll make a bunch of power functions, and we’ll list the function, its derivative, and what happens if we lift the function to dual numbers and evaluate it on an arbitrary dual number:

    f(x)      f'(x)      f(a+bε)
     1          0          1+0ε
     x          1          a+bε
     x2         2x         a2+2abε  (The higher power ε terms vanish)
     x3         3x2        a3+3a2bε
     xn         nxn-1      an+nan-1bε

It sure looks like for every power function, f(a+bε) = f(a) + f'(a)bε. (Exercise: prove this by induction on the power.) And moreover, we could pretty easily show that this is true for all finite sums of these things, and all scalar multiplications, and therefore true for all polynomials.

Is it true for other functions? We know that f(x) = ex is defined as the limit of the infinite sum

1 + x + x2/2! + x3/3! ... 

and that moreover, f(x) = f'(x). So what then is f(a+bε) ?

ea+bε = 1 + a+bε + (a2+2abε)/2! + (a3+3a2bε)/3! ... 
= (1 + a + a2/2! + ...) + (1 + a + a2/2! + ...)bε
= ea + eabε
= f(a) + f'(a)bε

It’s nice that this falls right out of the definition; we could also have gotten there faster by using this property of exponentiation:

ea+bε = eae
= ea(1 + bε)   because the higher terms are all zero!
= f(a) + f'(a)bε

I will not labour the point; using similar tricks we can show:

ln(a+bε) = ln(a) + (1/a)bε
cos(a+bε) = cos(a) - sin(a)bε
sin(a+bε) = sin(a) + cos(a)bε

And since y= ex ln y, we can now compute arbitrary powers of dual numbers as well.

In every case we find that for a function f(x) defined on reals, the “lifted” function on duals is f(a+bε) = f(a) + f'(a)bε. Now, to be strictly accurate, we need to put some limitations on the function. For example, its derivative has to exist! In fact we will be more restrictive than that, and limit the functions we consider to the “analytic” functions, which have the properties necessary to be amenable to calculus: they must be smooth, their first, second, third, … derivatives must exist, and so on. We don’t want to consider fractal curves or functions that come to sharp points, or have infinite slope at a point, or whatever. Just normal, everyday curves.

That all said, it still seems a little bit like magic that by lifting an ordinary function on doubles to a function on duals, we get a computation of both the function and its derivative at a point for almost free; there’s only an O(1) additional cost in time and memory to compute the derivative. This stands in contrast to the more traditional way of numerically approximating a derivative by repeatedly evaluating nearby points until the slope converges, which can involve many repeated computations and results in an approximation. Is there some intuitive explanation of why this works?

The intuition that works for me is: every analytic function is approximately a straight line if you restrict its range. Here’s our polynomial again, but this time graphed from -0.8 to -0.7. It’s nigh indistinguishable from a straight line over this range. But what straight line?

Screen Shot 2018-12-18 at 10.32.53 AM.png

Well, the straight line that goes through (-0.75, f(-0.75)) and has slope of f'(-0.75) obviously! Thus it should not be a surprise that f(-0.75 + bε) should be f(-0.75) + f'(-0.75)bε. We’re essentially waving our hands here and saying that ε is so small that we can approximate the function in the neighbourhood of -0.75 as a straight line.

Next time on FAIC: Can we do some slightly more complicated calculus with this technique?