Last time in this series I proposed a stripped-down DSL for probabilistic workflows. Today, let’s see how we could “lower” it to ordinary C# 7 code. I’ll assume of course that we have all of the types and extension methods that we’ve developed so far. (Code for this episode is here.)
The first thing I’m going to do is describe a possible but very problematic way to do it.
We know how C# lowers methods that have yield return
statements: it generates a class that implements IEnumerable
, rewrites the method body as the MoveNext
method of that class, and makes the method return an instance of the class. We could do the same thing here.
Recall that last time I gave three examples, the longest of which was:
probabilistic static IDiscreteDistribution<string> Workflow(int z)
{
int ii = sample TwoDSix();
if (ii == 2)
return “two”;
condition ii != z;
bool b = sample Flip();
return b ? “heads” : ii.ToString();
}
We could lower this to:
static IDiscreteDistribution<string> Workflow(int z) =>
new WorkflowDistribution(z);
sealed class WorkflowDistribution :
IDiscreteDistribution<string>
{
int z;
public WorkflowDistribution(int z)
{
this.z = z;
}
public string Sample()
{
start:
int ii = TwoDSix().Sample();
if (ii == 2)
return “two”;
if (!(ii != z))
goto start;
bool b = Flip().Sample();
return b ? “heads” : ii.ToString();
}
public IEnumerable<string> Support() => ???
public int Weight(string t) => ???
}
We’d need to make sure that we did the right thing if z
got modified during the workflow and the condition was not met of course, but that’s a small detail.
We could similarly lower the other two; I won’t labour the point. It’s a lot of boilerplate code.
The good news is: we get the right distribution out of this thing:
Console.WriteLine(Workflow(3).Histogram());
4|*** 5|**** 6|***** 7|******* 8|****** 9|**** 10|*** 11|** 12|* two|** heads|****************************************
The bad news is:
- We haven’t computed the support or the weights, as required.
- We are once again in a situation where a condition causes a potentially long-running loop to execute; we could do hundreds or thousands of iterations of going back to “start” for every successful sample if the condition was rarely met.
Basically, we’ve lost the great property that we had before: that we automatically get the right discrete distribution object inferred.
So let’s state unequivocally right now what our goal here is. The goal of this feature is to take a method body that describes a probabilistic workflow that includes conditions and produce a semantically equivalent distribution that does not actually contain any loop-causing condition, the same as we did for query-based workflows.
In our exploration of creating a posterior, we’ve seen that we can take a query that contains Where
clauses and produce a projected weighted integer distribution that matches the desired distribution. Though there is some cost to producing that distribution object, once we’ve paid that cost there is no additional per-sample cost. We can do the same here, by leveraging the work we’ve already done.
The technique I’m going to use is as follows:
- Each statement in the method will be numbered starting from zero; statements nested inside
if
statements are of course statements. - Each statement will be replaced by a local method named
Sn
, forn
the number of the statement. - Each local method will take as its arguments all the locals of the method. (If there are any assignments to formal parameters then they will be considered locals for this purpose.)
- Each local method will return a probability distribution.
At this point I could give you the general rule for each kind of statement in my stripped down language, but let’s not bother. Let’s just lower the methods and it will become obvious what we’re doing.
We’ll start with the easy one.
probabilistic IDiscreteDistribution<bool> Flip()
{
int x = sample Bernoulli.Distribution(1, 1);
return x == 0;
}
we rewrite this as:
IDiscreteDistribution<bool> Flip()
{
IDiscreteDistribution<bool> S0(int x) =>
Bernoulli.Distribution(1, 1)
.SelectMany(_x => S1(_x));
IDiscreteDistribution<bool> S1(int x) =>
Singleton<bool>.Distribution(x == 0);
return S0(0);
}
Read that over carefully and convince yourself that this implements the correct logic, just in a much more convoluted fashion.
Aside: Notice that we are taking advantage of the monad laws here regarding the relationship between binding and the function that produces a Singleton
. It might not have been clear before why this is an important monad law; hopefully it is now clear.
Aside: Recall that in the jargon of monads, the unit
operation that creates a new wrapper around a single instance of the underlying type is sometimes called return
. It is no coincidence that our lowering turns return
statements into singletons!
Now let’s lower the next one:
probabilistic IDiscreteDistribution<int> TwoDSix()
{
var d = SDU.Distribution(1, 6);
int x = sample d;
int y = sample d;
return x + y;
}
This becomes this mess:
static IDiscreteDistribution<int> TwoDSix()
{
IDiscreteDistribution<int> S0(
IDiscreteDistribution<int> d, int x, int y) =>
S1(StandardDiscreteUniform.Distribution(1, 6), x, y);
IDiscreteDistribution<int> S1(
IDiscreteDistribution<int> d, int x, int y) =>
d.SelectMany(_x => S2(d, _x, y));
IDiscreteDistribution<int> S2(
IDiscreteDistribution<int> d, int x, int y) =>
d.SelectMany(_y => S3(d, x, _y));
IDiscreteDistribution<int> S3(
IDiscreteDistribution<int> d, int x, int y) =>
Singleton<int>.Distribution(x + y);
return S0(null, 0, 0);
}
Again, walk through that and convince yourself that it’s doing the right thing. This implementation is ridiculously overcomplicated for what it does, but you should have confidence that it is doing the right thing. Remember, we’re trying to prove that the proposed language feature is in theory possible first, and then we’ll think about ways to make it better.
Finally, let’s do one with an if
and a condition
:
probabilistic IDiscreteDistribution<string> Workflow(int z)
{
int ii = sample TwoDSix();
if (ii == 2)
return “two”;
condition ii != z;
bool b = sample Flip();
return b ? “heads” : ii.ToString();
}
This lowers to:
static IDiscreteDistribution<string> Workflow(int z)
{
IDiscreteDistribution<string> S0(int ii, bool b) =>
TwoDSix().SelectMany(_ii => S1(_ii, b));
IDiscreteDistribution<string> S1(int ii, bool b) =>
ii == 2 ? S2(ii, b) : S3(ii, b);
IDiscreteDistribution<string> S2(int ii, bool b) =>
Singleton<string>.Distribution(“two”);
IDiscreteDistribution<string> S3(int ii, bool b) =>
ii != z ? S4(ii, b) : Empty<string>.Distribution;
IDiscreteDistribution<string> S4(int ii, bool b) =>
Flip().SelectMany(_b => S5(ii, _b));
IDiscreteDistribution<string> S5(int ii, bool b) =>
Singleton<string>.Distribution(b ? “heads” : ii.ToString());
return S0(0, false);
}
And at last we can find out what the distribution of this workflow is:
Console.WriteLine(Workflow(3).ShowWeights());
gives us:
two:2 heads:33 4:3 5:4 6:5 7:6 8:5 9:4 10:3 11:2 12:1
We can mechanically rewrite our little probabilistic workflow language into a program that automatically computes the desired probability distribution.
It should now be pretty clear what the rewrite rules are for our simple DSL:
- Normal assignment and declaration statements invoke the “next statement” method with new values for the locals.
- Assignment or declaration statements with
sample
becomeSelectMany
. - The
condition
statement evaluates their condition and then either continues to the next statement if it is met, or produces an empty distribution if it is not. if
statements evaluate their condition and then invoke the consequence or the alternative helper method.return
statements evaluate an expression and return aSingleton
of it.
Exercise: I severely restricted the kinds of statements and expressions that could appear in this language. Can you see how to implement:
while
,do
,for
,break
,continue
,switch
andgoto
/labeled statements- compound assignment, increment and decrement as statements
Exercise: What about assignments, increments and decrements as expressions; any issues there?
Exercise: What about lambdas? Any issues with adding them in? Again, we’re assuming that all functions called in this system are pure, and that includes lambdas.
Exercise: I severely restricted where the sample
operator may appear, just to make it easier on me. Can you see how we might ease that restriction? For example, it would be much nicer to write TwoDSix
as:
probabilistic IDiscreteDistribution<int> TwoDSix()
{
var d = SDU.Distribution(1, 6);
return sample d + sample d;
}
How might you modify the lowering regimen I’ve described to handle that?
We’ll discuss possible answers to all of these in future episodes.
As I’ve noted repeatedly in this series: of course this is not how we’d actually implement the proposed feature. Problems abound:
- I’ve absurdly over-complicated the code generation to the point where every statement is its own local function.
- We are passing around all the locals as parameters; probably some of them could be realized as actual locals.
- Each method is tail recursive, but C# does not guarantee that tail recursions are optimized away, so the stack depth could get quite large.
- Several of the methods take arguments that they do not ever use.
- We could have made some simple inlining optimizations that would decrease the number of helper methods considerably.
That last point bears further exploration. Imagine we applied an inlining optimization over and over again — that is, the optimization where we replace a function call with the body of the function. Since every function call here is expression-bodied, that’s pretty easy to do!
Exercise: Do so; the answer follows. (Note that I asked how to write this workflow as a query last time; did you get the same answer?)
.
.
.
.
What would come out the other end of repeated inlining is:
static IDiscreteDistribution<string> Workflow(int z) =>
TwoDSix().SelectMany(ii =>
ii == 2 ?
Singleton<string>.Distribution(“two”) :
ii != z ?
Flip().SelectMany(b =>
Singleton<string>.Distribution(b ?
“heads” :
ii.ToString())) :
Empty<string>.Distribution);
Which is hardly any easier to read, but at least it is not quite so many functions.
Is this then how we could implement this feature for real? Lower it to functions, then inline the functions? Though it would work, that’s probably not how I’d do it for real. I might get to discuss a more realistic lowering technique in a future episode, but we’ll see how it goes. This series is getting pretty long and we’re not done yet!
And again, my purpose here was to show that it could be done with a straightforward, rules-based transformation; I’m not looking to find the optimal solution. It can be done, which is great!
Aside: I said a few episodes back that it would become clear why having an explicitly empty distribution is a win; hopefully it is now clear. Having an explicitly empty distribution allows us to implement condition
using SelectMany
! It is not at all obvious how to rewrite the workflow above into a query with a Where
clause, but as we’ve seen, implementing the condition
statement as conditionally producing an empty distribution gives us the ability to stick what is logically a Where
anywhere in the workflow.
Aside: Many episodes back I gave the challenge of implementing a Where
that took a probabilistic predicate: Func<T, IDiscreteDistribution>
. That is no challenge at all in our new workflow language:
bool b = sample predicate(whatever);
condition b;
I sincerely hope that you agree that my proposed “imperative style” workflow is an improvement over our use of LINQ to project, condition and combine probability distributions; our Workflow()
method is a lot easier to read as an imperative workflow than its query form, and not just because it uses the jargon of probability rather than that of sequences.
Next time on FAIC: It looks like we’ve come up with a reasonable syntactic sugar, but is it just a sugar? Is there anything that our “imperative” workflow can do that our “LINQ” workflow cannot?
I think you typoed the keywords in the first code block.
Good catch, thanks!
Eric, you are a giant of a man when it comes to this stuff. I count myself grateful when I gleam even half of what you’re explaining. Thank you for taking the time to do this series. I think it is one of the most interesting you’ve ever done.
I have been playing with the code and enjoying the easy at which I can model randomness.
One question that came up for me was with regard to the method `static IDiscreteDistribution ToWeighted(this IEnumerable items, IEnumerable weights)` in the `static Distribution` class – would it be better to change the signature to be `static IDiscreteDistribution ToWeighted(this IEnumerable source)` to ensure that the same number of elements are in each list? This pushes the onus on the calling code to create matching lists. What are your thoughts there?
Cheers.
James.
First off thanks for your kind words; I’m glad you’re enjoying it. I’m enjoying writing it. 🙂
While I was prototyping up the code for this series I tried a few different ways of going back and forth between sequences and weighted distributions, and none of them were really great, so I just picked one for this series.
Other attempts at building PPLs have used various techniques: parallel sequences, sequences of (value, weight) tuples, dictionaries from value to weight, and so on.
Were I designing such an API for real-world usage rather than pedagogical blogging, I’d probably just implement all of them and let the users decide which best suited their needs.
As for the keywords, all of them can be implemented using
if
andgoto
(the same way they’re implemented in assembly/bytecode)! I can’t test it right now, but I think a naïve implementation ofgoto
would almost always cause infinite recursion when jumping backwards (such as for loops):Exactly right; “while”, “do”, “for”, “switch”, “break” and “continue” can all be implemented as combinations of “if” and “goto”. We already have “if”, and “goto” is easy. I’ll discuss this in more detail in the next episode.
Now, as for recursion: in languages without any form of “goto” other than “if”, you implement loops with recursion. So yes, you have to be careful; what would be an infinite loop becomes an infinite recursion.
This illustrates clearly that the technique I’m describing is not realistic, because it uses so much stack. Inlining optimizations help, but what we’d really need to do if we were building this thing for real is build a state machine. Which is exactly what iterator blocks and async blocks do.
Actually, I’m pretty sure we could make a function goto-compatible by simply using a
while(true)
loop and aswitch
statement, which could be replaced with a chain ofif
andelse
statements, to make a state machine. Awhile(true)
loop with enumerably manyThat would work also, but the reason we typically lower things to “if” and “goto” is because ultimately we’re driving towards IL, which has “branch if true” and “branch if false” as primitive operations.
True. The main question is how to implement
goto
in probabilistic workflows.The main issue is with infinite loops, particularly those that converge, but only after infinitely many iterations. As long as the number of states is finite, it’s possible,
Pingback: Dew Drop – April 12, 2019 (#2937) | Morning Dew
Pingback: Fixing Random, part 22 | Fabulous adventures in coding
(Damnit! I accidently tapped the “Post” button early, again!)
… As long as the number of states is finite, excluding the built-up result, it’s possible, as long as it converges, but it’s potentially very slow, as solving a huge system of equations (with a variable and an equation per possible state, including the current line/instruction/function/whatever) may be required.
You have anticipated the next part in this series. The technique that I have laid out here really only works for toy examples that are exactly solvable without a combinatory explosion of states. In short: I’ve spent 22 episodes solving only the most trivial problems in this space!
Maybe you’ll get to this in a later post, but I have given it some thought and can’t figure out how this scheme manages to guarantee that the empty distribution is never samples.
What happens when you call Workflow(3) and the TwoD6 roll comes up with 3?
By “this scheme” I assume you mean my lowering to nested methods, but if that’s what you mean, then I don’t understand the question. In my nested methods scheme, we never call “Sample” at all, so how could we sample the empty distribution? The output of Workflow(3) is a weighted distribution; sampling it does the usual work of sampling a weighted distribution.
But if that’s not clear, there’s a link to GitHub with a project that you can download and run in Visual Studio. Debug through it and see what happens!