Fixing Random, part 1

[Code is here.]


How to say this delicately?

I really, really dislike System.Random. Like, with a passion. It is so… awful.

The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way. And then System.Random comes along; I cringe every time I see code on StackOverflow that uses it, because it is almost always wrong, and it is seldom easy to see how to make it right.

Let’s start with the obvious problem: the natural way to use it is also the wrong way. The naive developer thinks “I need a new random dice roll, so…”

void M()
{
  Random r = new Random();
  int x = r.Next(1, 6);
  ...

Two lines in and we are already in a world of pain.

First, every time M() is called, we create a new Random, but in most of the implementations of Random available for the last couple decades, by default it seeds itself with the current time, and the granularity of the timer is only a few milliseconds. Computers run on the nanosecond scale, and so the likelihood that we’ll get two Randoms with the same seed is very high, and therefore it is very likely that successive calls to M() produce runs of the same number. You never want to make a new Random if you can avoid it; you want to make one once, and then re-use it. But it’s a bad idea to stash it away in a static, because it’s not thread safe!

Fortunately, I have just learned from an attentive reader that this problem has been fixed in some versions of the CLR; exactly which I am not sure. In those versions, a new Random() now seeds itself randomly, rather than based on the current time. Thanks to the BCL team for fixing this irksome problem; better late than never.

Second, the arguments to Next are the minimum value produced, inclusive, and the maximum value produced, exclusive! This program produces random numbers drawn from 1, 2, 3, 4, 5. The correct way to get numbers from 1 to 6 is of course Next(1, 7).

What we’ve got here is a classic example of Steve Maguire’s “candy machine interface” concept. The candy machines at Microsoft used to have an item number — say, 75, and a price, say, $0.85, you put in your quarters, punch in the item number and, dammit, wait, that’s item 085 and it costs $0.75, I got it backwards, and now I’ve got some weird gum instead of those jalapeno chips. This is a true story! I have actually made that mistake in Microsoft candy machines.

But those are trivial problems. The really fundamental problem here is that we’re working at too low a level of abstraction. It is not the 1970s anymore, when rand() was good enough. We have sophisticated problems in statistical modeling and the attendant probabilistic reasoning to solve in modern programming languages. We need to up our game by writing code in the “business domain” of probabilities, not the “mechanism domain” of calls to methods that return random numbers.

Coming up on FAIC: We will start by simply improving the existing implementation of Random, but from that humble beginning we’ll develop a new class library that makes programming with probabilities much more readable, powerful and efficient in C#.

An interesting list structure, part 3

This is the story of how I was no-hired by Microsoft; surprisingly, it is germane to our discussion of unusual list structures.

I was a Waterloo co-op intern three times at Microsoft, each time working for the Visual Basic team; that’s where I learned how industrial developer tools work, how to build a localizable product, and many other valuable lessons. The VB team told me that on the strength of my internships — which, after all, was in total a 12 month long interview spread over two years — that they’d give me a job when I graduated. However, the VB managers were both smart and kind, and encouraged me to look around the company more broadly to see if there was any other team that I might be a good fit for.

I ended up interviewing with two or maybe three other teams; this was over 20 years ago and my memory is a bit hazy. A few points do stand out though. I had an interview with the ill-fated Blackbird team that I thought went pretty well — the technical question was about rapidly finding the seven letter words in a Scrabble rack, and I gave the standard solution of preprocessing the dictionary into a multi-dictionary that answers the question in O(1) time, and then discussed how to use a trie to solve the more general problem. I’m not sure why they no-hired me, but that team was about to be disbanded so I dodged a bullet there. In retrospect, this was over a year after the “tidal wave email” and I am surprised they lasted as long as they did.

But the thing that really stands out is my interview with the SQL Server team. The day I interviewed there was a snowstorm in the greater Seattle area, which is quite rare and there is not infrastructure to rapidly clear the streets. A grand total of zero of the people scheduled to interview me came into work that day, and so recruiting had to scramble to find interviewers. The technical question I got from the SQL team was “reverse a linked list in C++” — that was the extent of the problem statement, so I asked “Can I assume I already have a linked list class defined?”  Sure. “Is it acceptable if the original list is destroyed?” Sure. “OK,” I said, and wrote on the whiteboard something like:

LinkedList reverse(LinkedList list) 
{
  LinkedList result;
  while (!list->is_empty()) result.push(list.pop());
  return result;
}

I’m probably missing some syntactic details there; I haven’t programmed in C++ in fifteen years. But you get the idea; reversing a linked list is logically just popping the old list and pushing onto the new list.

The interviewer just said something like “that’s wrong”, and I said “no, that’s right but would you like me to implement push and pop to show you how they work?” and it went downhill from there.

That was super confusing to me at the time. I realized much later — like, years later, when I started doing interviews myself — that the interviewer was unprepared, was not expecting to do an interview that day, may have been a very inexperienced interviewer, probably picked a problem at random, and worst of all had a specific “correct” solution in mind. Almost certainly the “correct” solution was the solution where you traverse the list and rejigger the “next” pointer “in place” as you go, which of course is how you reverse a linked list in C. (I am often surprised at the number of people in interviews who write C-style solutions in C++ or C#, as though those languages did not have a culture of encapsulation of implementation details.)

So it was a confusing day and a bunch of no-hires, but no harm done. Obviously I ended up having a successful 16 year stint at Microsoft regardless, and I learned from example how not to conduct a technical interview.

All this was on my mind as I was writing this series because the original paper by Hughes that introduced the concept I’ve been discussing is called A novel representation of lists and its application to the function “reverse”.

What on Earth does “reverse a linked list” have to do with this unusual representation of linked lists? I won’t go through the whole argument presented in the Hughes paper because it gets a bit technical, but I’ll give you what was for me the key insight.

Suppose we have our “simple” linked list from last time, where a list is either empty, or a head value and a tail list. How do you reverse that linked list? The problem here is posed in the context of functional languages where we do not have mutable local variables and loops, so a recursive solution is required.

If we have an “append” operation available then the naive recursive way to reverse a simple list is to say “recursively reverse the tail, and then append the old head to the reversed tail”. That is, if we have 1, 2, 3, then we recursively reverse 2, 3 into 3, 2, and then append 1 to get 3, 2, 1. In our implementation that would be:

public SimpleList<T> NaiveReverse() =>
  this.IsEmpty ?
    this :
    this.Pop.NaiveReverse().Append(this.Peek);

The problem with that should be clear given our previous discussion. We recurse once per list element, so we recurse O(n) times. But Append is an O(n) operation on simple lists, so the total complexity is O(n2), which is pretty bad.

The standard O(n) solution is to emulate my little loop above, but non-destructively and recursively:

private SimpleList<T> ReverseWithTail(SimpleList<T> tail) =>
  this.IsEmpty ?
    tail :
    this.Pop.ReverseWithTail(tail.Push(this.Peek));
public SimpleList<T> Reverse() => this.ReverseWithTail(Empty);

But… hold on a minute here. If we said

Func<SimpleList<T>, SimpleList<T>> f = someList.ReverseWithTail;

Then what we’ve got is an object that represents a list — in this case, it represents the list that is someList reversed — where the object is a function such that when you give it a list, it gives you back the represented list concatenated with the given list, in O(n) time; a delegate to ReverseWithTail is exactly the data stored in a Hughes list!

So perhaps it is not that bizarre to represent a list as a concatenation function. Functional programs are chock-full of algorithms like this one, where the “tail end” of a list is being passed in and the algorithm represents the operation of stuffing values onto the left side of it. The genius of the Hughes list is to recognize that this technique can be generalized into an elegant, useful data type that defers expensive work until it is needed.

Next time on FAIC: System.Random is the source of so many bugs; could we come up with a better design? This is going to be a long series, because there’s a lot of improvements we could make. We’re not yet programming at the correct level of abstraction when it comes to dealing with randomness; let’s see how we can fix that.

 

An interesting list structure, part 2

Last time on FAIC I gave a standard implementation of a persistent immutable linked list, with cheap — O(1) — Push, Pop, Peek and IsEmpty operations, and expensive — O(n) — Append and Concatenate operations. In this episode we’re going to implement a Hughes list, which “flips” those costs: Pop, Peek and IsEmpty become O(n), Append and Concatenate become O(1). Push stays O(1) also. This data structure is useful for scenarios where we are accumulating a large list “from both ends” or out of smaller lists, and then want to read the whole thing out as a normal list once we’re done the accumulation.

Before we get into it, I’ll repeat my caveat from last time: the implementations I’m showing today are not intended to be production-ready, heavily tested, performant or in any way bulletproof; the point of this exercise is to learn about the data structure, not to give a tutorial on good industrial design.

Last time I gave the data structure of a Hughes list, which is surprisingly simple:

public struct HughesList<T>
{
    private readonly Func<SimpleList<T>, SimpleList<T>> f;

Even if it is not at all clear to you how this is going to work — and it certainly was not clear to me when I learned about these! — this probably reminds you of your experience with LINQ.  Constructing a query object with LINQ is very fast, even if the data set is extremely large, because we use the Func and IEnumerable types to defer computations to the future. A Hughes list gets its good performance in the same way; it represents a sequence of operations on simple lists to be executed in the future, when they can be done in the highest-performance order, once.

But I am getting ahead of myself. What is that function? What are its semantics? They are both simple and subtle:

  • A Hughes list represents a list. Say, 1, 2, 3, 4.
  • The function is a function which can concatenate a simple list to the represented list; the returned value is the resulting simple list.

So if we have the Hughes list h representing 1, 2, 3, 4, and we call h.f(g) where g is the simple list representing 5, 6, 7, 8, then the result will be a simple list representing 1, 2, 3, 4, 5, 6, 7, 8.

Let’s see where this takes us.  The first thing I’m going to do is make a few little one-liner helpers to decrease the size of the code later. (In fact, every method I write is going to be a one-expression method!)

private HughesList(Func<SimpleList<T>, SimpleList<T>> f) =>
  this.f = f;
private static HughesList<T> Make(
  Func<SimpleList<T>, SimpleList<T>> f) =>
  new HughesList<T>(f);

Nothing interesting there. The first interesting question is: what does an empty Hughes list look like? Well, remember our basic principle: if we have the Hughes list that represents an empty list, then the function must be the function that concatenates any list to the empty list. But… that’s the identity function! Appending any list to the empty list is just the list. So this one is easy:

public static readonly HughesList<T> Empty Make(ts => ts);

We should expect that the empty list case is pretty straightforward; let’s get more complicated. How do we represent a Hughes list that contains a single element? That is, I want a function that takes a single T and gets back a Hughes list which represents the list containing only that value. Let’s suppose the value is 10. Our representation is a function which takes a list, say, 1, 2, 3, and returns that list concatenated to the list that contains only 10, which is 10, 1, 2, 3.

But… that’s just Push!

public static HughesList<T> Single(T t) => Make(ts => ts.Push(t));

Let’s make one more constructor. Suppose we have a simple list and we would like the equivalent Hughes list. Again, the Hughes list that represents a list is the function that concatenates something to that list. But… that’s Concatenate! We don’t even need a lambda for this one; we can turn the method directly into a delegate bound to the receiver.

public static HughesList<T> FromSimple(SimpleList<T> ts) =>
  Make(ts.Concatenate);

So far every operation has been O(1) and has been a factory method. Now let’s look at our first O(n) operation, an instance method: how do we turn an instance of a Hughes list back into a simple list?

Well, what have we got? A Hughes list represents a list. It has a function which takes any simple list and returns the simple list that consists of the Hughes list concatenated with the simple list.  The empty simple list is a legal argument, and it is the identity of concatenation, so we can simply write:

public SimpleList<T> ToSimple() => this.f(SimpleList<T>.Empty);

This is like doing a foreach over an OrderBy query in C#; the whole thing gets built at this point, and all the deferred work executes. As we’ll see, that work is O(n) in the size of the represented list.

Let’s blow right through the other O(n) operations. Again, the point of a Hughes list is that we want to decrease and defer the cost of Push, Append and Concatenate, making those operations O(1), but the down side is that we make Pop, IsEmpty and Peek all O(n). There is no way to tell from a function that can concatenate a list what the contents of the represented list are, so we’ll have to realize them as simple lists:

public T Peek => this.ToSimple().Peek;
public bool IsEmpty => this.ToSimple().IsEmpty;
public HughesList<T> Pop => FromSimple(this.ToSimple().Pop);
public override string ToString() => this.ToSimple().ToString();

(Exercise: We could make IsEmpty O(1); do you see how?)

(Exercise: I could have made Pop O(1). Can you do so? Why might this be a bad idea?)

All right, that leaves Push, Append and Concatenate. We will take each in turn.

Suppose we have the Hughes list for 1, 2, 3, and we want to push 10.

What have we got? We’ve got a function that takes a list, say, 4, 5, 6, and concatenates it to 1, 2, 3, to produce 1, 2, 3, 4, 5, 6.

What do we want? We want a function that takes a list, say, 4, 5, 6, and concatenates it with 10, 1, 2, 3, to produce 10, 1, 2, 3, 4, 5, 6.

That is the function “call the original concatenation function and then push 10 onto the result”:

private static HughesList<T> Push(HughesList<T> ts1, T t) =>
  Make(ts2 => ts1.f(ts2).Push(t));
public HughesList<T> Push(T t) => Push(this, t);

Why do I have this little indirection into a static method? Because this in a value type is logically a ref parameter and cannot be captured in a lambda, but a formal parameter is just a normal local variable. I wish C# had a way to say “treat this as an ordinary value, not a ref, in this lambda”.

But that’s not the interesting part; make sure the meaning of this is clear. When we turn this thing back into a simple list, we’re going to pass the empty list to the new function. That’s going to pass the empty list to the previous function, which will realize the old list. We’ll then push a value onto the old list.

Calling Push on the Hughes list is O(1), because all we’re doing here is allocating a delegate. What cost are we deferring here? Suppose the old list is of length n-1 and the new list is of length n; calling ToSimple on this thing will call the underlying concatenation to construct the tail, which takes n-1 steps, and then adds a single O(1) simple-list Push to that, for a total cost of O(n).

One of the nice things about Hughes lists is that the Append operation is almost exactly the same as the Push operation, in sharp contrast to the SimpleList implementation where they could not be more different.

private static HughesList<T> Append(HughesList<T> ts1, T t) =>
  Make(ts2 => ts1.f(ts2.Push(t)));
public HughesList<T> Append(T t) => Append(this, t);

Again, let’s talk through this. Suppose we have a Hughes list that represents 1, 2, 3 and we append 10. We want the Hughes list that represents 1, 2, 3, 10.

What do we have? A function that can take 4, 5, 6 and produce 1, 2, 3, 4, 5, 6.

What do we want? A function that can take 4, 5, 6 and produce 1, 2, 3, 10, 4, 5, 6.

So what do we do? We make a function that pushes 10 onto 4, 5, 6, and we know how to concatenate that to 1, 2, 3, which gives us the desired result!

Again, the Hughes list Append is obviously O(1) because it is just a delegate allocation. Suppose the original list is of length n-1. The call to ToSimple will do the O(1) Push onto the empty simple list, and then the n-1 steps to concatenate 10, 4, 5, 6 to 1, 2, 3, for a total cost of O(n).

We’ve only got one more, and I’m sure you can guess how it goes:


private static HughesList<T> Concatenate(
  HughesList<T> ts1, HughesList<T> ts2) =>
  Make(ts3 => ts1.f(ts2.f(ts3)));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
  Concatenate(this, ts2);

Going through that to verify that it does what it is supposed to is left as an exercise, as is the exercise to determine that doing a ToSimple on the result is O(n) in the size of the this list.

I said last time that it seems bizarre that there is no “list structure” here — no head and tail — but of course there is, it’s just encoded. This becomes more clear when you realize that our three operations, Push, Append, and Concatenate, are all the composition of f with another function. (Recall that the “composition” of two functions f1 and f2 is the function “call f2, and then call f1 with whatever f2 returns”.) Let’s rewrite our final three methods in that form, and we’ll notice something.

First we’ll make a helper method to do the composition for us:

private static HughesList<T> Make2(
  Func<SimpleList<T>, SimpleList<T>> f1,
  Func<SimpleList<T>, SimpleList<T>> f2) =>
  new HughesList<T>(a => f1(f2(a)));

And now we can rewrite Push, Append and Concatenate to use this helper. When we do that we no longer have any lambdas that close over “this”, so we can get rid of our static helper methods too!

public HughesList<T> Push(T t) =>
  Make2(ts2 => ts2.Push(t), this.f);
public HughesList<T> Append(T t) =>
  Make2(this.f, ts2 => ts2.Push(t));
public HughesList<T> Concatenate(HughesList<T> ts2) =>
  Make2(this.f, ts2.f);
 

And now it becomes considerably more clear what is going on here: if you squint a little, it looks like Make2 has the same structure as though it was a constructor of a binary tree class! Everything in the left argument is going on the left side of the list, and everything in the right argument is going in the right half! That’s why we can get away with such a “thin” data structure in a Hughes list; the real data structures are in the closures generated by the compiler!

Or, another way to look at it is: when we call ToSimple, all the operations in the right argument will be done first, and its result will be the argument to the function in the left side; in this manner we will build up the resulting list from right to left, which is the efficient way to do it.

Next time on FAIC: A relevant story about my Microsoft full-time interviews.

An interesting list structure, part 1

As long-time readers of my blog know, I’m a big fan of functional persistent immutable list data structures. I recently learned about a somewhat bizarre but very practical implementation of such lists, and naturally I wanted to implement it myself to try it out. The list data structure I’m going to discuss today is called a “Hughes list”, after its creator John Muir Hughes. The Haskell implementation is called “DList”, which stands for “difference list”, and this data structure is also commonly used in Prolog and Scala.

Let’s start by revisiting the standard immutable persistent linked list, which takes the form of a stack. As you probably recall, by “immutable” we mean that the operations like Push and Pop on the list do not change the list; it is immutable. Rather, they return a new list that has the desired content. Since lists are immutable, their contents can be re-used in new lists without worrying that a mutation will wreck things for the code that is using the new version; this is what is meant by “persistent” in this context.

In this implementation I’m going to implement the usual push, pop, peek and “is empty?” operations, but I’m going to add two more: “append”, which adds a single item to the list at the “wrong” end, and “concatenate”, which concatenates two lists together.

That is, if we have a list 1, 2, 3 and we push 4, the result is 4, 1, 2, 3.  If we have a list 1, 2, 3 and we append 4, we have the list 1, 2, 3, 4.  And if we have list 1, 2, 3, and we concatenate list 4, 5, 6, the result is 1, 2, 3, 4, 5, 6.  All clear?

Essentially I’m implementing catenable deques here, but as we’ll see, they are very inefficient in this implementation.

I’m also going to implement two static list constructors: one that makes an empty list, and one that makes a single-item list.

Oh, and before we begin: I’m going to use the standard simple recursive implementations of these operations. In practice, these would blow the stack in C# if the lists got big, because C# is not consistently tail-recursive. Were I designing these types for industrial use, I’d write iterative algorithms instead of recursive algorithms, and I’d also follow good practices like implementing IEnumerable and so on. These types are not intended for use in production; they are for pedagogic purposes only.

public abstract class SimpleList<T>
{
  public static readonly SimpleList<T> Empty =
    new EmptyList();
  public static SimpleList<T> Single(T t) =>
    Empty.Push(t);
  private SimpleList() { }
  public abstract bool IsEmpty { get; }
  public abstract T Peek { get; }
  // If this is a, b, c, then this.Pop is b, c
  public abstract SimpleList<T> Pop { get; }
  // If this is a, b, c and s is d, e, f, then
  // this.Concatenate(s) is a, b, c, d, e, f
  public abstract SimpleList<T> Concatenate(SimpleList<T> s);
  // If this is b, c, and t is a, then this.Push(t) is a, b, c
  public SimpleList<T> Push(T t) =>
    new NonEmptyList(t, this);
  // If this is b, c, and t is a, then this.Append(t) is b, c, a

  public SimpleList<T> Append(T t) =>
    this.Concatenate(Single(t));
  private sealed class EmptyList : SimpleList<T>
  {
    public override bool IsEmpty => true;
    public override T Peek =>
      throw new InvalidOperationException();
    public override SimpleList<T> Pop =>
      throw new InvalidOperationException();
    public override SimpleList<T> Concatenate(
      SimpleList<T> s) => s;
public override string ToString() => “”;
  }
  private sealed class NonEmptyList : SimpleList<T>
  {
    private readonly T head;
    private readonly SimpleList<T> tail;
    public NonEmptyList(T head, SimpleList<T> tail)
    {
      this.head = head;
      this.tail = tail;
    }
    public override bool IsEmpty => false;
    public override T Peek => head;
    public override SimpleList<T> Pop => tail;
    public override SimpleList<T> Concatenate(
        SimpleList<T> s) =>
      s.IsEmpty ? this : this.Pop.Concatenate(s).Push(this.Peek);
    }
    public override string ToString() => 
      this.Pop.IsEmpty ?
        $”{this.Peek}:
        $”{this.Peek}{this.Pop};
  }
}

All right, hopefully that was pretty straightforward. Empty, Single, IsEmpty, Peek, Push and Pop are all O(1). Concatenate is O(n) in the size of “this”, except for the special case of concatenating an empty list, which is O(1) regardless of the size of “this”. Append is implemented as Concatenate, so it is O(n).

The only algorithm which is even slightly tricky is Concatenate, but hopefully you see how it works. Basically, we’re using recursion to reverse “this” and then push each element of the reversed list onto s. As I noted above, a more efficient and safe implementation would be to actually reverse the list in a loop rather than using recursion to put that information onto the call stack, but whatever, this is for pedagogy — however, we will return to a related issue in a future episode, so keep it in mind!

The problem here is that we are often in a situation where Append and Concatenate are frequent operations, but Peek and Pop are not. For example, we might be in a “list building” scenario where we are composing a large list out of many smaller lists, and wish to be able to efficiently concatenate them.

I already did a series on using finger trees to implement an efficient catenable deque, but finger trees are a quite complex and difficult data structure to understand; you may recall that I did not even implement the efficient “break up/concatenate” operations because they were just too much work. The great thing about Hughes lists is that they are a seemingly very simple data structure that efficiently implements list concatenation. In fact, here it is:

public struct HughesList<T>
{
  private readonly Func<SimpleList<T>, SimpleList<T>> f;

}

And… that’s it. A Hughes list is a struct that wraps a single delegate; the delegate takes a simple list and returns a simple list. Of course I have left out all the methods and properties, but the delegate is the only data that is stored in a Hughes list.

I know what you’re thinking, because it’s the same thing that I thought when I first heard about this: that sounds completely bonkers. How can a function from lists to lists be a representation of a list? Where’s the head and the tail and all that stuff?

Next time on FAIC: We’ll find out how it works and build an implementation in less than two dozen lines of code.

Indexer error cases

Pop quiz: Consider these little methods. (They could be static, inline, whatever, that doesn’t matter.)

int[] A1()
{
  Console.WriteLine(“A”);
  return new int[10];
}
int[] A2()
{
  Console.WriteLine(“A”);
  return null;
}
int B1() 
{
  Console.WriteLine(“B”);
  return 1;
}
int B2()
{
  Console.WriteLine(“B”);
  return 11;
}
int C()
{
  Console.WriteLine(“C”);
  return 123;
}

Now I have four questions for you; can you predict the behaviour of each of these program fragments?

try { A1()[B1()] = C(); }
catch { Console.WriteLine(“catch”); }

try
 { A1()[B2()] = C(); }
catch { Console.WriteLine(“catch”); }

try
 { A2()[B1()] = C(); }
catch { Console.WriteLine(“catch”); }

try
 { A2()[B2()] = C(); }
catch { Console.WriteLine(“catch”); }

There’s no trickery going on here; seriously, make a mental model of what each of these does and try to predict their behaviour. Scroll down when you’re done.

.

.

.

.

.

The first one prints A, B, C. All the rest print A, B, C, catch.

Back when I was a young and naive compiler developer, I would have predicted the final three produce “A, B, catch“, based on the following reasoning:

  • In C# an assignment produces all the side effects of the left-hand-side expression before all the side effects of the right-hand-side expression.
  • An exception is a side effect.
  • The exceptions produced here — either the null deference when A2() is called, or index out of bounds when A1()[B2()] is called — should therefore happen when the left hand side is evaluated, before C() is evaluated.

This reasoning is incorrect; in C# (and also in Java and many other languages), it is the assignment itself that triggers the exception, even when the exception is the result of an indexing operation.

That is, my wrong naive mental model was that A()[B()]=C() had the semantics of:

int[] a = A();
int b = B();
// Make an alias; crash if its bad
ref int r = ref a[b]; 
int c = C();
r = c;

This is wrong; the correct model is that this is equivalent to:

int[] a = A();
int b = B();
int c = C();
a[b] = c;

I must admit, when I learned this shortly after joining the C# team, I was surprised and I considered it a bug in the design. But there are good reasons for this, and it turns out to have some nice properties.

First off, let’s get the main point out of the way. You should never write a program like the one I wrote, where there is a difference in behaviour depending on when the exception is thrown. Those exceptions should never be thrown! A program should not dereference null or blow past the end of an array and then catch the problem; you shouldn’t cause the problem in the first place. So the choice of what to do here really should only matter to the compiler writer, not to the user. If you write your code correctly, it makes no difference when the exception is thrown because it will not actually be thrown.

So, given that it ought not to matter to the user, what is the language designer to do? Should the language designer say that “evaluate the left” causes the exception, or should the exception be triggered by the assignment, after the right has been evaluated?

One way to think about it is: what if we were calling a user-defined indexer setter instead of an array. Suppose we have a class X with an indexer:

public int this[int i]
{
  get { … whatever … }
  set
  {
    if (i < 0 || i > 10)
      throw new BlahBlahBlah();
    … whatever …
  }
}

An indexer is just a fancy way to write a method call. Suppose we have:

X X1()
{
  Console.WriteLine(“X”);
  return new X();
}

And now what does X1()[B()]=C(); mean? Plainly it must mean:

X x = X1();
int b = B();
int c = C();
x.indexer_setter(b, c);

And that last line has the semantics of “throw if x is null, otherwise do the call and throw if the index is out of bounds”.  In this scenario we definitely evaluate B() even if X1() returns null, and we definitely evaluate C() even if B() is out of bounds, so it makes sense that we should be consistent between user-defined and built-in indexers.

The decision to make the evaluation of an indexer defer all the checks until the assignment also helps in this scenario; think about how you would make this work if you were the compiler writer:

async Task<intD() {
  await Task.Delay(1000);
  return 123;
}

A()[B()] = await D();

At the point of the await, the current activation is going to spill all of its local state into a closure, assign the continuation of the current task to complete the assignment, and return control to the caller; when D() completes, the continuation is activated and the assignment happens. But what information is stored in the closure? If the left side is computed as

int[] a = A();
int b = B();
ref int r = ref a[b];

Then we can lose “a” and “b”; we never read them again. But we need to somehow store a ref in a closure, which is not legal in the CLR! There are ways to get around that, but boy are they painful and expensive. By deferring the validation of the collection and index until the assignment, we eliminate the need to spill a reference off the stack into a closures; we just store the ordinary values of “a” and “b”.

Of course, the design team for C# 1.0 did not know that the design team for C# 5.0 was going to need to implement temporary spilling into closures, but it surely is a happy accident that the way it does work is the way that is cheapest and easiest to implement should you need to put temporaries into a closure.

Next time on FAIC: I recently learned about an interesting way to represent immutable lists that has completely different performance characteristics than standard immutable lists. At first glance it seems impossible, but it all works out beautifully. We’ll do a short series on how to make standard linked lists into catenable deques without going to all the trouble of building a finger tree, like we did last time.

Dual numbers, part 4

Last time on FAIC I gave a hand-wavy intuitive explanation for why it is that lifting a function on doubles to duals gives us “automatic differentiation” on that function — provided of course that the original function is well-behaved to begin with. I’m not going to give a more formal proof of that fact; today rather I thought I’d do a brief post about multivariate calculus, which is not as hard as it sounds. (In sharp contrast to the next step, partial differential equations, which are much harder than they sound; I see that Professor Siv, as we called him, is still at UW and I hope is still cheerfully making undergraduates sweat.)

We understand that the derivative of an analytic function f(x) that takes a real and produces a real is the function f'(x) that gives the slope of f(x) at each point. But suppose we have a function that takes two reals and produces a real. Maybe f(x, y) = cos(x/5) sin(y/3). Here, I’ll graph that function from (0, 0) to (20, 20):

Screen Shot 2018-12-18 at 12.31.54 PM.png

We could implement this in C# easily enough:

static  Compute2( x,  y) => Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

That’s a pretty straightforward analytic function of two variables, but we’re programmers; we can think of two-variable functions as a one-variable factory that produces one-variable functions. (This technique is called currying, after the logician Haskell Curry.)

static Func<, > Compute2FromX( x) => ( y) =>
  Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

That is, given a value for x, we get back a function f(y), and now we are back in the world of single-variable real-to-real functions. We could then take the derivative of this thing with respect to y and find the slope “in the y direction” at a given (x, y) point.

Similarly we could go in the other direction:

static Func<, > Compute2FromY( y) => ( x) =>
  Math.Cos(x / 5.0) * Math.Sin(y / 3.0);

And now given a “fixed” y coordinate we can find the corresponding function in x, and compute the slope “in the x direction”.

In wacky calculus notation, we’d say that ∂f(x, y)/∂x is the two-variable function that restricts f(x, y) to a specific y and computes the slope of that one-variable function in the x direction. Similarly ∂f(x, y)/∂y is the function that restricts f(x, y) to a specific x and computes the slope of that one-variable function in the y direction.

You probably see how we could compute the partial derivatives of a two-variable, real-valued analytic function using our Dual type; just curry the two-variable function into two one-variable functions, change all of the “doubles” to “Duals” in the “returned” function, and provide dualized implementations of sine and cosine.

But… do we actually need to do the currying step? It turns out no! We do need implementations of sine and cosine; as we discussed last time, they are:

public static  Cos( x) =>
  new (Math.Cos(x.r), Math.Sin(x.r) * x.ε);
public static  Sin( x) =>
  new (Math.Sin(x.r), Math.Cos(x.r) * x.ε);

Easy peasy. We can now write our computation in non-curried, dualized form:

static  Compute2( x,  y) => .Cos(x / 5.0) * .Sin(y / 3.0);

Now suppose we are interested in the x-direction slope and the y-direction slope at (13, 12), which is part of the way up the slope of the right hand “hill” in the graph:

Console.WriteLine(Compute2(13 + .EpsilonOne, 12));
Console.WriteLine(Compute2(13, 12 + .EpsilonOne));

We do the computation twice, and tell the compute engine to vary one-epsilon in the x direction the first time, and not at all in the y direction. The second time, we vary by one-epsilon in the y direction but not at all in the x direction. The output is:


0.648495546750919+0.0780265449058288ε
0.648495546750919+0.186699955809795ε

Which gives us that the slope in the x direction is about 0.078, and the slope in the y direction is about 0.187. So for the cost of computing the function twice, we can get the derivatives in both directions. Moreover, if we wanted to compute the slope in some other direction than “parallel to the x axis” or “parallel to the y axis”, perhaps you can see how we’d do that.

But we’ve actually gotten something even a bit more interesting and useful here: we’ve computed the gradient!

The gradient of a multivariate function like this one is the direction you’d have to walk to gain altitude the fastest. That is, if you are standing at point (13, 12) on that graph, and you walk 0.078 steps in the x direction for every 0.187 steps in the y direction, you’ll be heading in the “steepest” direction, which is the direction likely to get you to the top of the mountain the fastest:

Gradient.jpg

And of course if you walk in the “negative of the gradient” direction, you’re heading “downslope” as fast as possible.

Notice that heading in the “upslope” or “downslope” direction does not necessarily move you directly towards the nearest peak, valley or saddle; it heads you in the direction that is gaining or losing altitude the fastest. If you do not know exactly where the peaks and valleys are, that’s your best guess of the right direction to go.

I hope it is clear that there is no reason why this should be limited to functions of two variables, even though it is hard to visualize higher-dimensional functions! We can have a real-valued function of any number of variables, take the partial derivatives of each, and find the gradient that points us in the “steepest” direction.

Why is this useful? Well, imagine you have some very complicated but smooth multivariate function that represents, say, the cost of a particular operation which you would like to minimize. We might make an informed guess via some heuristic that a particular point is near a “cost valley”, and then compute the gradient at that point; this gives us a hint as to the direction of the nearest valley, and we can then try another, even more informed guess until we find a place where all the gradients are very small.  Or, equivalently, we might have a “benefit function” that we are trying to maximize, and we need a hint as to what direction the nearby peaks are. These techniques are in general called “gradient descent” and “gradient ascent” methods and they are quite useful in practice for finding local minima and maxima.

The point is: as long as the function you are computing is computed only by adding, subtracting, multiplying, and dividing doubles, or using double-valued library functions whose derivatives you know, you can “lift” the function to duals with only a minor code change, and then you get the derivative of that function computed exactly, for a very small additional cost per computation. That covers a lot of functions where it might not be at all obvious how to compute the derivative. Even better, for an n-dimensional function we can compute the gradient for only a cost of O(n) additional operations. (And in fact there are even more efficient ways to use dual numbers do this computation, but I’m not going to discuss them in this series.)

A great example of a high-dimensional computation that only uses real addition and multiplication is the evaluation of a function represented by a neural net in a machine learning application; by automatically differentiating a neural network during the training step, we can very efficiently tweak the neuron weights during the training step to seek a local maximum in the fitness of the network to the given task. But I think I will leave that discussion for another day.

Next time on FAIC: What exactly does A()[B()]=C(); do in C#? It might seem straightforward, but there are some subtle corner cases.

Dual numbers, part 3

Last time on FAIC we discovered that when you “lift” a polynomial on doubles to a polynomial on duals and then evaluate it with x+1ε, the “real” part of the returned dual exactly corresponds to the original polynomial at x; that’s not a surprise. What is bizarre and fascinating is that the “epsilon” part of the dual exactly corresponds to the derivative of the polynomial at x!

First question: is this actually true for all polynomials, or did I trick you by choosing an example that just happened to work? Let’s make a little chart. I’ll make a bunch of power functions, and we’ll list the function, its derivative, and what happens if we lift the function to dual numbers and evaluate it on an arbitrary dual number:

    f(x)      f'(x)      f(a+bε)
     1          0          1+0ε
     x          1          a+bε
     x2         2x         a2+2abε  (The higher power ε terms vanish)
     x3         3x2        a3+3a2bε
     ...
     xn         nxn-1      an+nan-1bε
     ...

It sure looks like for every power function, f(a+bε) = f(a) + f'(a)bε. (Exercise: prove this by induction on the power.) And moreover, we could pretty easily show that this is true for all finite sums of these things, and all scalar multiplications, and therefore true for all polynomials.

Is it true for other functions? We know that f(x) = ex is defined as the limit of the infinite sum

1 + x + x2/2! + x3/3! ... 

and that moreover, f(x) = f'(x). So what then is f(a+bε) ?

ea+bε = 1 + a+bε + (a2+2abε)/2! + (a3+3a2bε)/3! ... 
= (1 + a + a2/2! + ...) + (1 + a + a2/2! + ...)bε
= ea + eabε
= f(a) + f'(a)bε

It’s nice that this falls right out of the definition; we could also have gotten there faster by using this property of exponentiation:

ea+bε = eae
= ea(1 + bε)   because the higher terms are all zero!
= f(a) + f'(a)bε

I will not labour the point; using similar tricks we can show:

ln(a+bε) = ln(a) + (1/a)bε
cos(a+bε) = cos(a) - sin(a)bε
sin(a+bε) = sin(a) + cos(a)bε

And since y= ex ln y, we can now compute arbitrary powers of dual numbers as well.

In every case we find that for a function f(x) defined on reals, the “lifted” function on duals is f(a+bε) = f(a) + f'(a)bε. Now, to be strictly accurate, we need to put some limitations on the function. For example, its derivative has to exist! In fact we will be more restrictive than that, and limit the functions we consider to the “analytic” functions, which have the properties necessary to be amenable to calculus: they must be smooth, their first, second, third, … derivatives must exist, and so on. We don’t want to consider fractal curves or functions that come to sharp points, or have infinite slope at a point, or whatever. Just normal, everyday curves.

That all said, it still seems a little bit like magic that by lifting an ordinary function on doubles to a function on duals, we get a computation of both the function and its derivative at a point for almost free; there’s only an O(1) additional cost in time and memory to compute the derivative. This stands in contrast to the more traditional way of numerically approximating a derivative by repeatedly evaluating nearby points until the slope converges, which can involve many repeated computations and results in an approximation. Is there some intuitive explanation of why this works?

The intuition that works for me is: every analytic function is approximately a straight line if you restrict its range. Here’s our polynomial again, but this time graphed from -0.8 to -0.7. It’s nigh indistinguishable from a straight line over this range. But what straight line?

Screen Shot 2018-12-18 at 10.32.53 AM.png

Well, the straight line that goes through (-0.75, f(-0.75)) and has slope of f'(-0.75) obviously! Thus it should not be a surprise that f(-0.75 + bε) should be f(-0.75) + f'(-0.75)bε. We’re essentially waving our hands here and saying that ε is so small that we can approximate the function in the neighbourhood of -0.75 as a straight line.

Next time on FAIC: Can we do some slightly more complicated calculus with this technique?

Dual numbers, part 2

Last time on FAIC I introduced a fun variation on complex numbers called “dual numbers” consisting of a value “plus a tiny bit” notated as ε, where the really tiny part has the interesting property that ε2 is zero.

Well, enough chit-chat; let’s implement them.

You know what, just for grins I’m going to make this look a bit more like math, and a bit more concise while we’re at it:

using  = Dual;
using  = System.Double;

(Unfortunately 𝔻 is not a legal identifier in C#.)

A dual is logically a value that consists of two doubles, so it makes sense to represent them as an immutable value type:

public struct Dual : IComparable<Dual>
{
  public  Real { get; }
  public  Epsilon { get; }
  public  r => this.Real;
  public  ε => this.Epsilon;

I got so sick of typing out “Real” and “Epsilon” that I had to make these helper properties, even though they do not meet C# naming guidelines. Guidelines are guidelines, not rules!

Notice also that, fun fact, C# lets you use any non-surrogate-pair Unicode character classified as a letter in an identifier.

The constructor is as you’d expect, and I’m going to make a helper property for 0+1ε:

public Dual( real,  epsilon)
{
  this.Real = real;
  this.Epsilon = epsilon;
}

public static  EpsilonOne = new (0, 1);

The math is as we’ve discussed:

public static  operator +( x) => x;
public static  operator ( x) => new (x.r, x.ε);
public static  operator +( x,  y) =>
  new (x.r + y.r, x.ε + y.ε);
public static  operator ( x,  y) =>
  new (x.r  y.r, x.ε  y.ε);
public static  operator *( x,  y) =>
  new (x.r * y.r, x.ε * y.r + x.r * y.ε);
public static  operator /( x,  y) =>
  new (x.r / y.r, (x.ε * y.r  x.r * y.ε) / (y.r * y.r));

So easy!

UPDATE: I was WRONG WRONG WRONG in the section which follows.

An earlier version of this episode suggested that we implement comparisons on dual numbers as “compare the real parts; if equal, compare the epsilon parts”. But upon reflection, I think that’s not right. One of the characteristics of dual numbers that we like is that “lifting” a computation on reals to dual numbers produce the same “real part” in the result. Suppose we have:

static bool Foo( x) => x > 2.0;

Then we reasonably expect that

static bool  Foo( x) => x > 2.0;

agrees with it, no matter what value we have for epsilon.

So I’m going to tweak the code so that we compare only real parts.

Resuming now the original episode…

My preference is to implement the logic in one place in a helper, and then call that helper everywhere:

private static int CompareTo( x,  y) =>
  x.r.CompareTo(y.r);
public
 int CompareTo( x) =>
  CompareTo(this, x);
public static bool operator <( x,  y) =>
  CompareTo(x, y) < 0;
public static bool operator >( x,  y) =>
  CompareTo(x, y) > 0;
public static bool operator <=( x,  y) =>
  CompareTo(x, y) <= 0;
public static bool operator >=( x,  y) =>
  CompareTo(x, y) >= 0;
public static bool operator ==( x,  y) =>
  CompareTo(x, y) == 0;
public static bool operator !=( x,  y) =>
  CompareTo(x, y) != 0;
public bool Equals( x) =>
CompareTo(this, x) == 0;

public override bool Equals(object obj) =>
  obj is  x && CompareTo(this, x) == 0;

And finally, a few loose ends. It would be nice to be able to convert to Dual from double automatically, and we should also override ToString and GetHashCode just to be good citizens:

public static implicit operator ( x) => new (x, 0);
public override string ToString() => $”{r}{(ε<0.0?“”:“+”)}{ε}ε”;
public override int GetHashCode() => r.GetHashCode();

Super, that was really easy. And now we can take any old method that does math on doubles, and make it do math on Duals. Suppose we’ve got this little guy, that computes x4+2x3-12x2-2x+6:

static  Compute( x) =>
  x * x * x * x + 2 * x * x * x  12 * x * x  2 * x + 6;

If we just turn all the doubles into Duals:

static  Compute( x) =>
  x * x * x * x + 2 * x * x * x  12 * x * x  2 * x + 6;

Then we have the same function, but now implemented in the dual number system. This:

Console.WriteLine(Compute(1.0 + Dual.EpsilonOne));

Produces the output -5-16ε, which agrees with the original method in the real part, and is -16 in the epsilon part. Apparently computing that polynomial with one plus a tiny amount gives us -5, plus -16 “more tiny amounts”.

Hmm.

You know what, just for grins I’m going to compute this function from -4.5+εto 3.5+ε and then graph the real and epsilon parts of the output of our “dualized” function.

The blue line is the real part, the orange line is the epsilon part. As we expect, the blue line corresponds to the original polynomial. But the orange line looks suspiciously familiar…

Screen Shot 2018-12-17 at 5.36.45 PM.png

HOLY GOODNESS THE ORANGE LINE IS ZERO WHERE THE BLUE LINE IS FLAT.

The epsilon part is the derivative of the polynomial!

Next time on FAIC: How’d that happen?

Dual numbers, part 1

I’ve recently been looking into a fascinating corner of mathematics that at first glance appears a little bit silly, but actually has far-reaching applications, from physics to numerical methods to machine learning. I thought I’d share what I’ve learned over the next few episodes.

I assume you recall what a complex number is, but perhaps not all of the details. A complex number is usually introduced as a pair of real numbers (a, b), where a is called the “real part” and b is called the “imaginary part”.

A brief aside: it has always bugged me that these labels are unnecessarily value-laden. There is no particular “reality” that is associated with the real part; it is every bit as “imaginary” as the imaginary part. They might as well be called the “rezrov part” and the “gnusto part”, but we’re stuck with “real” and “imaginary”. Moving on.

It would be perfectly sensible to notate these as a pair (a, b) — and indeed, when we characterize complex numbers as points on the complex plane, we often do. But it is also convenient to notate complex numbers in an algebraic form a+bi, where the i indicates the imaginary part.

Finally, we have well-known rules for adding and multiplying complex numbers. The fundamental rule is that (0+1i)2 is equal to -1+0i, and the rest of the rules follow from there. For complex numbers:

(a+bi) + (c+di) = (a+c) + (b+d)i
(a+bi)(c+di)    = (ac-bd) + (ad+bc)i

Division is a little bit tricky and I won’t write it all out here; look it up if you do not remember. We can similarly go on to derive rules for complex exponentiation, trigonometry, and so on.

Let’s switch gears slightly. It is very common in mathematics, and especially calculus, to reason about a quantity “plus a little bit”. That is, any old quantity, plus some positive amount that is not zero, but is much smaller than one, and really, really small compared to the original quantity. What “really, really small” means is of course dependent on context; maybe one-millionth of the original quantity is small. Maybe one-millionth is still pretty big, but one-trillionth, or one part in a googolplex, or whatever, is “really small” in our context. I’m being deliberately vague and hand-wavy here; just run with it.

Let’s call our really small positive quantity ε, which is the lower-case Greek letter epsilon; it is traditionally used in calculus to represent a very small quantity. Let’s suppose our value is a, and we have a very small amount added to it, so that would be a+ε. Hmm. What if we doubled that quantity? Plainly that would be 2a+2ε. So scalar multiplication works as you’d expect. It seems reasonable that if we can have then we can have or for that matter 3.14ε or for any real value of b. So once again we have an (a, b) pair of numbers representing a quantity, which we notate as a+bε. What are the algebraic rules of this thing? It certainly makes sense that addition should work as we expect, the same as complex addition:

(a+bε) + (c+dε) = (a+c) + (b+d)ε

This should match our intuitions. The sum of “a plus a bit” and “c plus a bit” should be “a + c plus a bit” if there is justice in the world. What about multiplication? Let’s just multiply it out using ordinary algebra:

(a+bε)(c+dε) = ac + (ad+bc)ε + bdε2

We seem to have introduced an ε2 term there. What are we going to do with that? Remember, ε is not zero but is very, very small indeed, so that thing squared must be super tiny. So tiny that we can just ignore it entirely! (Remember, I am hand-waving egregiously here.) Let’s just say that ε2 is by definition zero, just as we said that i2 is by definition -1, and see what happens. So we have:

(a+bε)(c+dε) = ac + (ad+bc)ε

Again, this should match our intuitions. The product of something a bit bigger than a with something a bit bigger than c should be a bit bigger than the product of a and c.

What about division? Division by zero is illegal, and division by zero plus a tiny amount I think should also be illegal. Seems reasonable. What then is (a+bε)/(c+dε) when c is not zero? Well, if it is not zero then we can multiply top and bottom by c-dε:

a+bε   (c-dε)(a+bε)   ac + (bc-da)ε    a   bc-da
———— = ——————————— = —————————————— = - + ————— ε
c+dε   (c-dε)(c+dε)       c2           c     c2

Once again our intuition is satisfied: the quotient of “something close to a” divided by “something close to c” is close to the quotient of a and c.

This is a pretty neat number system; it is called the dual number system and it was introduced to the world in 1873 by the English mathematician William Clifford. But so far it might look like it is just a curiosity; however, it is extremely useful.

Next time on FAIC: we will implement the dual number system in C#, and discover a surprising property.

Accessibility of nested classes

Happy New Year all and welcome to another year of having fabulous adventures in coding. I thought I’d start this year off by answering a question I get quite frequently:

I have a public outer class with a public nested class. The nested class represents a concept only meaningful in the context of the outer class, and it accesses private members of the outer class. I want some of the members of the nested class to be public to everyone, but I want some of them to be public only to the outer class; how can I do that?

Typically the members they want to be accessible only to the containing class to be things like property setters and constructors:

public class Outer
{
  public class Inner
  {
    // I want the ctor and setter to be accessible in Outer
    // but not to the general public.
    public Inner() { ... }
    public Foo Foo { get; set; } 
    // whatever ... 
  }
  public Inner Frob() { ... }
  // ... and so on ...
}

I can think of four ways to solve this problem offhand, though I’m sure there are more.

Attempt one: my preferred solution is to deny the premise of the question and make the “private” details that need to be used by the container “internal”. It is your assembly! If you don’t want to abuse the internal details outside of a particular class, then don’t do it. If your coworkers do it, educate them in code reviews. Not every authorial intention needs to be expressed in the accessibility system, after all.

Attempt two: again, deny the premise of the question, but this time make the nested type a private implementation detail. The received wisdom is that we want nested classes to be private implementation details of the public outer class, not themselves public classes. In that case you’d simply make an interface to expose the public details:

public interface IFoo
{
  public Foo Foo { get; }
}
public class Outer
{
  private class Inner : IFoo
  {
    public Inner() { ... }
    public Foo Foo { get; private set; } 
    // whatever ... 
  }
  public IFoo Frob() { ... }
  // ... and so on ...
}

This has a couple of down sides though. We’ve denied the premise of the question, which is that the nested class makes sense to be public, and so not answered the question. We’ve also “polluted” the public namespace by adding a new IFoo concept unnecessarily. Also, there is now the possibility that just anyone can create an implementation of IFoo. Typically in these sorts of scenarios, the reason we want to restrict the constructor and setter is to ensure that the invariants of the inner type are being guaranteed by the code in the outer type.

Attempt three: did you know that it is legal for a type to implement an interface that is less public than the type? That is not true for base types, but it is true for interfaces!

public class Outer
{
  private interface IFooSetter
  {
    public Foo Foo { set; }
  }
  public class Inner : IFooSetter
  {
    public Inner() { ... }
    public Foo Foo { get; private set; } 
    Foo IFooSetter.Foo { set { this.Foo = value; } }
    // whatever ... 
  }
  public Inner Frob() { ... }
  // ... and so on ...
}

Now code in Outer can cast an Inner to IFooSetter and call the setter that way, but code outside of Outer cannot cast to IFooSetter because that type is private to Outer.

This looks pretty good, but again we’ve got a few problems. We still haven’t dealt with the constructor problem. And in both the solutions we’ve seen so far, it works poorly in the case where the inner type is a value type because we end up boxing.  Yes, it is a very bad practice to make a mutable value type, but hey, that’s probably why we’re restricting the mutators to be only accessible from Outer! Regardless though, we’d like a way to avoid the boxing penalty.

Attempt four: avoid interfaces altogether and go straight to delegates:

// This code is wrong!
public class Outer
{
  private static Func<Inner> innerFactory = null;
  private static Action<Inner, Foo> fooSetter = null;
  public class Inner
  {
    static Inner() 
    {
      Outer.innerFactory = ()=>new Inner();
      Outer.fooSetter = (i, f)=>i.Foo = f;
    }
    private Inner() { ... }
    public Foo Foo { get; private set; } 
    // ...
  }
  public Inner Frob() => innerFactory();
  // ... and so on ...
}

Now code in Outer can call innerFactory() any time it needs an Inner, and can call fooSetter(inner, foo) any time it needs to call a setter on an Inner.

I’ll leave you with two exercises:

  • First, the code above is wrong because it can be easily made to crash; do you see how? Can you fix it? (Hint: there is no way using just the rules of C# that I know of; you need to use a special method created for compiler writers.)
  • Second, suppose the inner type is a mutable value type; our setter would mutate a copy. Can you fix that too?