Fixing Random, part 13

Last time on FAIC we discovered the interesting fact that conditional probabilities can be represented as likelihood functions, and that applying a conditional probability to a prior probability looks suspiciously like `SelectMany`, which is usually the bind operation on the sequence monad. We created a new implementation of `SelectMany`that creates an object which samples from the prior, calls the likelihood, and then samples from the resulting distribution. Is that the bind operation on the probability distribution monad?

Aside: If you’re completely confused by the preceding paragraph, you might want to read my gentle introduction to monads for OO programmers. Go ahead and read that over if it is not fresh in your mind.

We need the following things to have a monad in C#:

• We need an “embarrassingly generic” type: some `Foo<T>` where it can sensibly take on any `T` whatsoever. `IDiscreteDistribution<T>` meets that condition.
• The type represents an “amplification of power” of the underlying type. Indeed it does; it allows us to represent a probability distribution of particular values of that type, which is certainly a new power that we did not have before.
• We need a way of taking any specific value of any `T`, and creating an instance of the monadic type that represents that specific value. `Singleton.Distribution(t)` meets that condition.
• There is frequently(but not necessarily) an operation that extracts a value of the underlying type from an instance of the monad. `Sample()` is that operation. Note that sampling a singleton always gives you back the original value.
• There is a way to “bind” a new function onto an existing instance of the monad. That operation has the signature `M<R> SelectMany<A, R>(M<A> m, Func<A, M<R>> f)`.  We traditionally call it `SelectMany` in C# because that’s the bind operation on `IEnumerable<T>`, and it produces a projection on all the elements from a sequence of sequences. As we saw last time, we have this function for probability distributions.
• Binding the “create a new instance” function to an existing monad must produce an equivalent monad. I think it is pretty clear that if we have an `IDiscreteDistribution` in hand, call it `d`, that `SelectMany(d, t => Singleton.Distribution(t))` produces an object that has the same distribution that `d` does. If that’s not clear, play around with the code until it becomes clear to you.
• Going “the other direction” must also work. That is, if we have a `Func<A, IDiscreteDistribution<B>>` called `f`, and a value of `A`, then `SelectMany(Singleton<A>.Distribution(a), f)` and `f(a)` must produce logically the same `IDiscreteDistribution<B>`. Again, if that’s not clearly true in your mind, step through the code mentally or write some sample code and convince yourself that it is true.
• Two bind operations “on top of each other” must produce the same logical result as a single bind that is the composition of the two bound functions. That’s maybe a little vague; see Part 7 of my series on monads for details. Suffice to say, we meet this condition as well.

All our conditions are met; `IDiscreteDistribution<T>` is a monad. So we should be able to use it in a query comprehension, right?

from c in cold
from s in SneezedGivenCold(c)
select s

Unfortunately this gives an error saying that `SelectMany` cannot be found; what’s up with that?

The query comprehension syntax actually requires a slight variation on the traditional “bind” operation; it requires that we also allow a projection on the end, and that moreover, the projection take both the original value and the transformed value. That is, C# requires us to implement it like this:

public sealed class Combined<A, B, C> :
IDiscreteDistribution<C>
{
private readonly List<C> support;
private readonly IDiscreteDistribution<A> prior;
private readonly Func<A, IDiscreteDistribution<B>> likelihood;
private readonly Func<A, B, C> projection;
public static IDiscreteDistribution<C> Distribution(
IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood,
Func<A, B, C> projection) =>
new Combined<A, B, C>(prior, likelihood, projection);
private Combined(
IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood,
Func<A, B, C> projection)
{
this.prior = prior;
this.likelihood = likelihood;
this.projection = projection;
var s = from a in prior.Support()
from b in this.likelihood(a).Support()
select projection(a, b);
this.support = s.Distinct().ToList();

}
public IEnumerable<C> Support() =>
this.support.Select(x => x);
public int Weight(C c) => NOT YET!
public C Sample()
{
A a = this.prior.Sample();
B b = this.likelihood(a).Sample();
return this.projection(a, b);
}
}

And now we can implement `SelectMany` as

public static IDiscreteDistribution<C> SelectMany<A, B, C>(
this IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood,
Func<A, B, C> projection) =>
Combined<A, B, C>.Distribution(prior, likelihood, projection);

and of course if we want a `SelectMany` with the traditional monad bind signature, that’s just

public static IDiscreteDistribution<B> SelectMany<A, B>(
this IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood) =>
SelectMany(prior, likelihood, (a, b) => b);

Now that we have a`SelectMany`, we can write conditional probabilities in comprehension syntax, as before:

var sneezed = from c in cold
from s in SneezedGivenCold(c)
select s;

or, if we like, we can extract a tuple giving us both values:

public static IDiscreteDistribution<(A, B)> Joint<A, B>(
this IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood) =>
SelectMany(prior, likelihood, (a, b) => (a, b));

var joint = cold.Joint(SneezedGivenCold);

and if we graph that, we see that we get the distribution we worked out by hand from last episode:

Console.WriteLine(sneezed.Histogram());
Console.WriteLine(sneezed.ShowWeights());
Console
.WriteLine(joint.Histogram());
Console.WriteLine(joint.ShowWeights());

``` No|****************************************
Yes|*****
No:111
Yes:14

(No, No)|****************************************
(No, Yes)|*
(Yes, No)|
(Yes, Yes)|***
(No, No):873
(No, Yes):27
(Yes, No):15
(Yes, Yes):85
```

Aside: Of course I am cheating slightly here because I have not yet implemented the weight function on the combined distribution; we’ll get to that next time!

It might seem slightly vexing that C# requires us to implement a variation on the standard bind operation, but in this case it is actually exactly what we want. Why’s that?

Let’s remind ourselves of how we are notating probability distributions. If we have a collection of possible outcomes of type Cold, we notate that distribution as P(Cold); since Cold has two possibilities, this distribution is made up from two probabilities, P(Cold.Yes) and P(Cold.No) which add up to 100%. We represent this in our type system as `IDiscreteDistribution<Cold>`

A conditional probability distribution P(Sneezed|Cold) is “given a value from type Cold, what is the associated distribution P(Sneezed)“?  In other words, it is `Func<Cold, IDiscreteDistribution<Sneezed>>`.

What then is P(Cold&Sneezed)?  That is our notation for the joint distribution over all possible pairs. This is made up of four possibilities: P(Cold.No & Sneezed.No), P(Cold.No&Sneezed.Yes), P(Cold.Yes&Sneezed.No), and P(Cold.Yes&Sneezed.Yes), which also add up to 100%.

In our type system, this is `IDiscreteDistribution<(Cold, Sneezed)>`

Now, remember the fundamental law of conditional probability is:

P(A) P(B|A) = P(A&B)

That is, the probability of A and B both occurring is the probability of A occurring, multiplied by the probability of B occurring given that A has.

That is, we can pick any values from those types, say:

P(Cold.Yes) P(Sneezed.Yes|Cold.Yes) = P(Cold.Yes&Sneezed.Yes)

That is, the probability of some value of A and some value of B both occurring is the probability of the value of A occurring multiplied by the probability of the value of B given that the value of A has occurred.

Aside: “multiplication” here is assuming that the probabilities are between 0.0 and 1.0, but again, squint a little and you’ll see that it’s all just weights. In the next episode we’ll see how to compute the weights as integers by thinking about how to do the multiplication in fractions.

We’ve implemented P(A) as `IDiscreteDistribution<A>`, we’ve implemented P(B|A) as `Func<A, IDiscreteDistribution<B>>`, and P(A&B) as `IDiscreteDistribution<(A, B)>.`

We have a function `Joint<A, B>`​ that takes the first two and gives you the third, and if you work out the math, you’ll see that the probabilities of each member of the joint distribution that results are the products of the probabilities given from the prior and the likelihood. Multiplication of a prior probability by a likelihood across all members of a type is implemented by `SelectMany.`

Coming up on FAIC: We’ll work out the weights correctly, and that will enable us to build an optimized  `SelectMany` implementation.

8 thoughts on “Fixing Random, part 13”

1. I noticed that you use .Select(x => x) in order to “wrap” a collection object (List) in an IEnumerable, so that a client couldn’t cast it back into a list. Is this a “good” solution to that problem, or in practice would there be a “better” way?

• I think it’s a reasonably pragmatic solution; one of the difficulties that I’ve noted in this series is that you can end up in a situation where a list is copied multiple times before it is stored, but if that were a serious problem then it could be optimized without too much difficulty I suspect.

Of course, it depends on what you are optimizing *for*. Maybe optimizing for fewer copies is not so important, but preventing collection pressure on enumeration is important. In that case, this is a really bad technique, and it would be better to use a read only list wrapper or something like that, which has a custom enumerator that produces less pressure.

• You have mentioned collection pressure a couple times in this series (and in previous posts). In context it has always made sense, but I’ve never seen it actually defined, and a search outside this blog has not turned up anything related to software design.

Can you concretely define it for me?

• Collection is expensive, and so you do not want to do it very often, to avoid the performance problem of stopping the world and doing a collection. In particular, you do not want to collect if nothing will be freed; that’s just a waste of time. But failing to collect often enough can lead to other performance problems; say, if data locality becomes poor or you end up “thrashing” the page file, or whatever.

The GC uses an abstract idea of “collection pressure”; there are activities which increase the “collection pressure”, and when pressure gets too high, a collection happens. That’s the “abstract” explanation. You asked for a “concrete” explanation, but the concrete explanation is simply: the GC has been heavily tuned over a period of many years to take different factors into account when computing pressure, like total amount of memory allocated, rate of new allocations, size of new allocations, and so on. I don’t know what the exact algorithm is; you’d have to look at the GC sources. But lots of small heap allocations will definitely increase pressure.

Therefore we see a lot of places in the implementation of C# and the base class libraries to prevent excess pressure. For example, if you do “foreach” on a generic list in C#, obviously that calls GetEnumerator. But on a list there are actually *two* implementations of GetEnumerator and one of them returns a mutable struct; the foreach loop will use that implementation preferentially over the implementation that returns a reference to IEnumerator because the version that returns a mutable struct does not allocate anything off the heap and therefore does not increase pressure; boxing it to IEnumerator does allocate off the heap, and that does increase pressure.

I’ve often pushed back on the myth “value types always live on the stack”. Here we see the nugget of truth behind that myth: because the value-type version of the list enumerator is small and almost always has a short lifetime, we can safely put it on the stack and avoid the collection pressure caused by boxing it to the heap.

2. Hi Eric,

I’m a relatively junior engineer who just left Microsoft and transitioned into one of the “FANG”s on the Java stack, I haven’t hated Java more in my life. The lambdas are a lie, async programming is a pain in the butt. Any chance you’re working on democratizing .net core at FB? Seems like there’s a bias amongst architects at these companies and they look at C# as some sort of dinosaur language far inferior to java. I wish it were more popular and trendy like Golang is these days.

• I was far more surprised than I should have been when I left Microsoft and went to work on the linux stack that I had not touched in 20 years. “What are you using for your build system?” Make, obviously! Source control? Git, obviously. Debugger? We just log everything to disk and read the log files when something goes wrong. I miss the Microsoft stack so much.

I wish I were trying to get .NET established, but what I’m really doing is re-inventing a bunch of stuff that Microsoft already did twenty years ago.

The notion that C# is inferior to Java is mind-boggling. When C# started, it was unfairly critiqued as “merely Java with the stupid parts removed”, but nowadays Java just feels like C# with the parts that make it productive and performant removed. Fortunately it seems that Java’s long drought of new ideas and features is finally over and they’ve been making good progress in the last few years.

I’ve been working in Scala with IntelliJ IDEA lately and finding it a reasonably pleasant IDE and a very pleasant language, so maybe check that out.