All right, let’s finish this thing off!
First, I want to summarize, second I want to describe a whole lot of interesting stuff that I did not get to, and third, I want to give a selection of papers and further reading that inspired me in this series.
If you come away with nothing else from this series, the key message is: probabilistic programming is important, it is too hard today, and we can do a lot better than we are doing. We need to build better tools that leverage the type system and support line-of-business programmers who need to do probabilistic work, the same way that we built better tools for programmers who needed to use sequences, or databases, or asynchrony.
- We started this series with me complaining about
System.Random, hence the name of the series. Even though some of the implementation details have finally improved after only some decades of misleading developers, we are still dealing with random numbers like it was 1972.
- The abstraction that we’re missing is to make “value drawn from a random distribution” a part of the type system, the same way that “function that returns a value”, or “sequence of values”, or “value that might be null” is part of the type system.
- The type we want is something like a sequence enumeration, but instead of a “move next” operation, we’ve got a “sample” operation.
- If we stick to simple distributions where the support is finite and small and the “weights” are integers, and restrict ourselves to pure functions, we can create new distributions from old ones using the same operations that we use to create new sequences from old ones: the LINQ operators
- Moreover, we can compute distributions exactly at runtime, without doing rejection sampling or other clever techniques.
- And we can even use query comprehension syntax, which makes me happy.
- From this we can see that probability distributions are monads; P(A) is just
- We also see that conditional probabilities P(B given A) are just
Func<A, IDistribution<B>>— they are likelihood functions.
SelectManyoperation on the probability monad lets us combine a likelihood function with a prior probability.
Whereoperation lets us compute a posterior given a prior, a likelihood and an observation.
- This is an extremely useful result; even domain experts like doctors often do not have good intuitions about how an observation should change our opinions about the probability of an event having occurred. A positive result on a test of a rare disease may be only weak evidence if the test failure rate is close to the disease rate.
- Can we put these features in the language as well as the type system? Abusing LINQ is clever but maybe not the best from a usability perspective.
- We could in fact embed
conditionoperators in the C# language, just as we embedded
await. We can then write an imperative workflow, and have the compiler generate a method that returns the correct probability distribution, just as
awaitallows us to write an asynchronous workflow and the compiler generates a method that returns a task!
- Unfortunately, real-world probability problems are seldom discrete, small, and completely analyzable; they’re more often continuous and approximated.
Sampleefficiently on arbitrary distributions turns out to be a hard problem.
- We can use our tools to generate Markov chains.
- We can use Markov chains to implement
Sampleusing the Metropolis Algorithm
- If we have a continuous prior (like a mint that produces coins of a certain quality with a certain probability) and a discrete likelihood (like the probability of a coin flip coming up heads), we can use this technique to compute a continuous posterior given an observation of one or more flips.
- This is a very useful technique in many real-world applications.
- Computing the expected value of a distribution given a function is a tricky problem in of itself, but there are a variety of techniques we can use.
- And if we can do that, we can solve some high-dimensional integral calculus problems!
That was rather a lot, and we still did not get to everything I wanted to.
By far the biggest thing that I did not get to, and that I may return to in another series of posts, is: the connection between
await as an operator and
sampleas an operator is deeper than you might think.
As I noted above, you can put
condition operations in a language and have the compiler build a method that when run, generates a simple, discrete distribution. But it turns out we can actually do a pretty good job of dealing with sample operators on non-discrete distributions as well, by having the compiler be smart about using some of the techniques that we discussed in this series for continuous distributions.
What you really need is the ability to pick a sample, run a little bit of a routine, remember the result, back up a bit and try a different sample, and so on; from this we can build a distribution of program traces, and from that we can build an approximation of the distribution of the output of a probabilistic method!
This kind of control flow is tricky; it’s sort of a generalization of coroutines where you’re allowed to re-run code that you’ve run before, but with different values for the variables to see what happens.
Obviously it is crucially important that the methods be pure! It’s also crucially important that you spend most of your time exploring high-likelihood control flows, because the number of unlikely control flows is nigh infinite. If this sounds a bit like training up an ML model and then using that model in production later, that’s because it is basically the same thing, but applied to programs.
I know what you’re thinking: that sounds bonkers. But — here is the thing that really got me motivated to write this series in the first place — we actually did it.
A while back my colleague Michael and I hacked together (ha ha) an implementation of multi-shot-continuations for Hack, our PHP variant and showed that we could in fact do probabilistic workflows where we have a distribution, and we sample from it some number of times, and trace out what happens in the program as we do so.
I then went on to work on other projects, but in the meanwhile a team of people who understand statistics far, far better than I do actually built an industrial-strength probabilistic control flow language with a sample operator.
You can read all about it at their paper Hack PPL: A Universal Probabilistic Programming Language.
Another point that I very much wanted to get to in this series but did not is: we can do the same thing in C#, and in fact we can do it today.
The C# team added the ability to return types other than
Task<T> from asynchronous workflows; and it turns out you only need to abuse that feature a small amount to convince C# to “go back a bit” in the workflow – back to the previous
await operation, which becomes a stand-in for
sample — and re-run portions of it with different sampled values. The C# team could add probabilistic workflows to C# tomorrow.
The C# team has historically done a great job of finding useful monads and embedding them into the control flow of the language; monadic probabilistic workflows with multi-shot continuations could be the next one. How about it team?
Finally, here’s a very incomplete list of papers and web sites that were inspiring to me in writing this series. I learned a lot, and there is plenty more to learn; I’ve just scratched the surface here.
- The idea that we can treat probability distributions like LINQ queries was given to me by my friend and director Erik Meijer; his fun and accessible paper Making Money Using Math hits many of the same points I did in this series, but I did it with a lot more code.
- The design of coroutines in Kotlin was a great inspiration to me; they’ve done a great job of making features that you would normally think of as being part of the language proper, like
await, into library methods. The first thing I did in my multi-shot coroutine hack was verify that I could simulate those features. (I was also very pleased to discover that much of this work was implemented by my old colleague Nikov from the C# team!)
- An introduction to probabilistic programming is a book-length work that uses a Lisp-like language with sample and observe primitives.
- Church is another Lisp-like language often used in academic studies of PPLs.
The next few papers are more technical.
- Build Your Own Probability Monads is a good overview for the Haskell programmers out there, as are Practical Probabilistic Programming With Monads and Stochastic Lambda Calculus and Monads of Probability Distributions.
- Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation is a good overview of how you can use MCMC techniques on program traces. A provably correct sampler for probabilistic programs goes into some of the correctness problems faced by PPL implementations, and Generating Efficient MCMC Kernels goes into some of the performance problems.
- Another fascinating area that I wanted to explore in this series is: we know it is bad enough to try to debug programs that have yields and awaits in them; how on earth do you debug programs that are actually running the same code paths maybe thousands of times when exploring the sample space of a workflow? Here’s an interesting paper on debugging probabilistic workflows.
And these are some good ones for the math:
- These MIT course notes on Bayesian updating of continuous priors was a good primer for my episodes on coin flipping. And the lecture notes MC Methods and Importance Sampling are what it says on the tin.
- Understanding the Metropolis-Hastings Algorithm gives a bunch of the underlying math.
- The enormous book Information Theory, Inference and Learning Algorithms was invaluable in getting me up to speed on math I had not done since my undergrad days.
There are many more that I am forgetting, and I’ll add to this list as I recall them.
All right, that was super fun; I am off on my annual vacation where I have no phone or internet, so I’m going to take a bit of a break from blogging; we’ll see you in a month or so for more fabulous adventures in coding!