Last time in this series we saw that we could compute a continuous posterior distribution when given a continuous prior and a discrete likelihood function; I hope it is clear how that is useful, but I’d like to switch gears for a moment and look at a different (but also extremely useful) computation: the expected value.
I’ll start with a quick refresher on how to compute the expected value of a discrete distribution.
You probably already know what expected value of a discrete distribution is; we’ve seen it before in this series. But in case you don’t recall, the basic idea is: supposing we have a distribution of values of a type where we can meaningfully take an average; the “expected value” is the average value of a set of samples as the number of samples gets very large.
A simple example is: what’s the expected value of rolling a standard, fair six-sided die? You could compute it empirically by rolling 6000d6 and dividing by 6000, but that would take a while.
Aside: Again, recall that in Dungeons and Dragons, XdY is “roll a fair Y-sided die X times and take the sum”.
We could also compute this without doing any rolling; we’d expect that about 1000 of those rolls would be 1, 1000 would be 2, and so on. So we should add up (1000 + 2000 + … + 6000) / 6000, which is just (1 + 2 + 3 + 4 + 5 + 6) / 6, which is 3.5. On average when you roll a fair six-sided die, you get 3.5.
We can then do variations on this scenario; what if the die is still fair, but the labels are not 1, 2, 3, 4, 5, 6, but instead -9, -1, 0, 1, 3, 30? As I’m sure you can imagine, once again we can compute the expected value by just taking the average: (-9 – 1 + 0 + 1 + 3 + 30) / 6 = 4.
Aside: It’s a bit weird that the “expected value” of a distribution is a value that is not even in the support of the distribution; I’ve never once rolled a 3.5 on a d6. Beginners sometimes confuse the expected value with the mode: that is, the value that you’d expect to get more often than any other value. Remember, the expected value is just an average; it only makes sense in distributions where the sampled values can be averaged.
What if the die isn’t fair? In that case we can compute a weighted average; the computation is the value of each side, multiplied by the weight of that side, sum that, and divide by the total weight. As we saw in a previous episode:
public static double ExpectedValue(
this IDiscreteDistribution<int> d) =>
d.Support()
.Select(s =>
(double)s * d.Weight(s)).Sum() / d.TotalWeight();
And of course we could similarly define methods for discrete distributions of double and so on. Hopefully that is all clear.
The question I want to explore in the next few episodes requires us to make a small extension of the meaning of “expected value”:
- Suppose we have a distribution of outcomes
d
, in the form of anIWeightedDistribution<double>
- Suppose we have a function
f
from double to double which assigns a value to each outcome. - We wish to accurately and efficiently estimate the average value of
f(d.Sample())
as the number of samples becomes large.
Aside: If we had implemented Select
on weighted distributions, that would be the same as the expected value of d.Select(f)
— but we didn’t!
Aside: We could be slightly more general and say that the distribution is on any T
, and f
is a function from T
to double, but for simplicity’s sake we’ll stick to continuous, one-dimensional distributions in this series. At least for now.
There’s an obvious and extremely concise solution; if we want the average as the number of samples gets large, just compute the average of a large number of samples! It’s like one line of code. Since we make no use of any weights, we can take any distribution:
public static double ExpectedValue(
this IDistribution<double> d,
Func<double, double> f) =>
d.Samples().Take(1000).Select(f).Average();
In fact, we could make this even more general; we only need to get a number out of the function:
public static double ExpectedValue<T>(
this IDistribution<T> d,
Func<T, double> f) =>
d.Samples().Take(1000).Select(f).Average();
We could also make it more specific, for the common case where the function is an identity:
public static double ExpectedValue(
this IDistribution<double> d) =>
d.ExpectedValue(x => x);
Let’s look at a couple of examples. (Code for this episode can be found here.) Suppose we have a distribution from 0.0 to 1.0, say the beta distribution from last time, but we’ll skew it a bit:
var distribution = Beta.Distribution(2, 5);
Console.WriteLine(distribution.Histogram(0, 1));
**** ******* ********* ********* *********** ************ ************* ************** *************** **************** ***************** ****************** ******************* ******************** ********************** *********************** ************************ *************************** ***************************** ----------------------------------------
It looks like we’ve heavily biased these coins towards flipping tails; suppose we draw a coin from this mint; what is the average fairness of the coins we draw? We can just draw a thousand of them and take the average to get an estimate of the expected value:
Console.WriteLine(distribution.ExpectedValue());
0.28099740981762
That is, we expect that a coin drawn from this mint will come up heads about 28% of the time and tails 72% of the time, which conforms to our intuition that this mint produces coins that are heavily weighted towards tails.
Or, here’s an idea; remember the distribution we determined last time: the posterior distribution of fairness of a coin drawn from a Beta(5, 5) mint, flipped once, that turned up heads. On average, what is the fairness of such a coin? (Remember, this is the average given that we’ve discarded all the coins that came up tails on their first flip.)
var prior = Beta.Distribution(5, 5);
IWeightedDistribution<Result> likelihood(double d) =>
Flip<Result>.Distribution(Heads, Tails, d);
var posterior = prior.Posterior(likelihood)(Heads);
Console.WriteLine(posterior.ExpectedValue());
0.55313807698807
As we’d expect, if we draw a coin from this mint, flip it once, and it comes up heads, on average if we did this scenario a lot of times, the coins would be biased to about 55% heads to 45% tails.
So, once again we’ve implemented a powerful tool in a single line of code! That’s awesome.
Right?
I hope?
Unfortunately, this naive implementation has a number of problems.
Exercise: What are the potential problems with this implementation? List them in the comments!
Next time on FAIC: We’ll start to address some of the problems with this naive implementation.
The most obvious problem to me is that it assumes that 1000 samples is enough to get an accurate approximation of the expected value. When doing coin flips or d6 rolls, that may be a reasonable assumption, but I could certainly expect that if you were rolling 10,000-sided dice with nonlinear labels (say, labeled 1, 2, 3, …, 9998, 9999, a googolplex) you wouldn’t have great odds of having an accurate expected value.
Also, it seems to ruin the entire point of the series, that you can treat random distributions as distributions rather than just getting a bunch of values and guessing how they behave.
Excellent points. If we have a discrete distribution, we can compute the expected value exactly; if we have no information, the best we can do is try a bunch and take an average. But it seems like if we have weights, we ought to be able to do something with that.
Pingback: Dew Drop – May 21, 2019 (#2963) | Morning Dew
Pingback: Fixing Random, part 32 | Fabulous adventures in coding