Fixing Random, part 20

Without further ado, here’s my proposed stripped-down C# that could be a DSL for probabilistic workflows; as we’ll see, it is quite similar to both enumerator blocks from C# 2 and async/await from C# 5. (Code for this episode can be found here.)

  • The workflow must be a function with return type of IDiscreteDistribution<T> for some T.
  • Just as methods that use await must be marked async, our DSL methods must be marked probabilistic.
  • The function has an ordinary function body: a block of statements.

Obviously I am using C# 5 asynchronous workflows as an example of how to add a new mechanism to C#. The statements and the expressions in a probabilistic method are restricted as follows:

  • Declaration statements must have an initializer.
  • All the locals must have unique names throughout the method.
  • Assignments can only be in declarations or as statements; no use of assignments in the middle of an expression is allowed.
  • No compound assignments, increments or decrements.
  • No await, yield, try, throw, switch, while, do, break, continue, goto, fixed, lock, using or labeled statements allowed. What’s left? if and return statements are fine, as are blocks { }. I told you I was stripping things down!
  • No lambdas, anonymous functions or query comprehensions.
  • No use of in, out or ref.
  • All the other perfectly normal operations are allowed — function calls, object initializers, member access, indexers, arithmetic, that’s all just fine.
  • However, all function calls and whatnot must be pure; there can be no side effects, and nothing should throw.

Basically what I’m doing here is making a little super-simple subset of C# that doesn’t have any of the weird, hard stuff that keep compiler writers busy. In this pleasant world locals are always assigned, there are no closures, there are no worries about what happens when an exception is thrown, and all that sort of stuff.

This is pretty draconian, I know. We’ll sketch out how to soften some of those restrictions in later episodes. I want to show that we can describe how to implement a simple core of a language; harder features can come later.

To this little language I’m going to add a new unary operator and a new statement. The new operator is sample, and it may appear only as the right operand of an assignment, or the initializer of a declaration:

int x = sample WeightedInteger.Distribution(8, 2, 7, 3);

The operand must be of type IDiscreteDistribution<T>, and the type of the expression is T. Again, this should feel familiar if you’re used to await.

The new statement is condition, and it has the form

condition expression;

The expression must be convertible to bool. The meaning of this thing is much the same as Where: if the Boolean condition is not met, then a value is filtered out from the distribution by setting its weight to zero. But I’m getting ahead of myself.

Aside: Last time I talked a bit about how language designers must choose how general or specific a language element is, and that this must be reflected in syntax; this is a great example of such a choice.

I’ve chosen condition because I think of this operation as creating a conditional distribution; I could have said where instead of condition and had it mean basically the same thing, but that would be moving towards the established C# jargon for sequences, which I’m explicitly trying to avoid.

However, as we’ve seen, a primary usage case for a conditional distribution is computing the posterior distribution when given a prior and a likelihood. But why do we wish to compute the posterior at all? Typically because we have observed something. That is, the development cycle of the probabilistic program is:

  • Before the program is written, data scientists and developers compute priors, like “What percentage of emails are spam?”
  • They also compute likelihood functions: “what word combinations are likely, given that an email is spam? What word combinations are likely given that an email is not spam?”
  • A spam detection program must now answer the question: given that we have observed an email with certain word combinations, what is the posterior probability that it is spam? This is where the probabilistic workflow comes in.

For that reason, we could reasonably choose observation or observe as our keyword here, instead of condition, emphasizing to the developer “the reason you use this feature is to express the computation of a posterior distribution for a given observation”. That would make the feature feel a little bit less general, but might help more clearly express the desired semantics in the mind of the typical programmer designing a workflow that computes a posterior.

I’m going to stick with condition, but I just wanted to point out that this is a choice that has user experience consequences, were we actually to implement this feature in a language like C#.

Let’s look at some examples:

probabilistic IDiscreteDistribution<bool> Flip()
  int x = sample Bernoulli.Distribution(1, 1);
  return x == 0;

What I would like is for this method to have the exact same semantics as if I had written:

 IDiscreteDistribution<bool> Flip() => 
   Bernoulli.Distribution(1, 1).Select(x => x == 0);

What do you think about these two implementations? I think the former looks a lot more like the straightforward, imperative code that we’re used to.

Let’s look at another:

probabilistic IDiscreteDistribution<int> TwoDSix()
  var d = SDU.Distribution(1, 6);
  int x = sample d;
  int y = sample d;
  return x + y;

Again, it seems to me that this is more clear than

IDiscreteDistribution<int> TwoDSix() 
  var d = SDU.Distribution(1, 6);
  return from x in d from y in d select x + y;

And it certainly seems more clear, particularly to the novice, than

IDiscreteDistribution<int> TwoDSix() 
  var d = SDU.Distribution(1, 6);
  return d.SelectMany(x => d, (x, y) => x + y);

LINQ is awesome for sequences, but I think the statement-based workflow is much easier to understand for distributions.

Now let’s look at a really complicated workflow, this one with conditioning:

probabilistic IDiscreteDistribution<string> Workflow(int z)
  int ii = sample TwoDSix();
  if (ii == 2)
    return “two”;
  condition ii != z;
  bool b = sample Flip();
  return b ? “heads” : ii.ToString();

The first two were easy to see how they corresponded to query syntax, but what even is the distribution represented by this workflow? It depends on the value of the parameter z, for one thing.

What I want here is: when you call this method, you get a distribution back. When you Sample() that distribution, it should logically have the same effect as: all the sample operations Sample() their operand, and the returned value is, well, the returned value. However, if any condition is false then we abandon the current attempt and start over.

The trick is implementing those semantics without actually running the body in a loop!

Exercise: Pick a value for z; let’s say 3. See if you can work out what the distribution of strings is that should come out the other end of this thing. I’ll give the answer in the next episode.

Exercise: If you had to represent this workflow as a query, how would you do it?

Next time on FAIC: Suppose we were writing a compiler for this feature. We know that LINQ works by turning query comprehensions into function calls; we know that the compiler turns async workflows and iterator blocks build state-machine coroutines. How might we lower this new kind of code into ordinary C# 7 code? Probably somehow using our existing query operators… but what exactly would work?




16 thoughts on “Fixing Random, part 20

  1. Other than the requirement that all methods should be pure, and that nothing should throw, this could all quite easily be retrofitted onto C#. However C# has no concept of purity, so that is the chief sticking point.

    Now I think it’s obvious that this isn’t a mainstream enough usage to warrant adding this as a first class feature. However, within specific domains, this would make for an extremely useful DSL.

    Given that this, yield return and async await are all essentially transforms of operations on monads into ordinary C# via the usage of state machines, do you think that there is some room to explore creating a generalized method for people to build their own dsls on top of some core primitives surrounding this?

    • Absolutely there is room for that, and Kotlin has done it; they have a generalized coroutine system upon which you can build async, generators, and so on.

      In C#, the ability to return your own types from an async workflow gives the developer a hook into the C# state machine system, but it’s a bit kludgy. I’m hoping to get into that in a later episode.

      • F# has something like that with its Computational Expressions. With them, it would be possible to implement something like the DSL proposed here as a library, no compiler changes necessary.
        The resulting syntax could look quite similar to what you proposed in this post.

      • Hi Eric, I’m a big fan of your work on C# and a long time reader of your blog.
        I am very interested in your future post on understanding how to hook into the async machinery.
        Basically as I understand it, the compiler is performing a CPS transfromation and
        then pass the transformed code to a contextual object that manages the scheduling of tasks on various threads.
        Do you think it is possible to leverage that compiler feature to create
        powerful combinators while leaving aside the default threading aspect ?
        An example would be something that can derecursify and memoize a method like

        async MyTask Fib(int n) // hypothetical example
        if (n <= 1) return n;
        return await Fib(n-1) + await Fib(n-2);

        I managed to do it with something like:

        void Fib(int n, Action Ret, Action<int, Action> Rec)
        if (n Rec(n-2, y => Ret(x + y)));

        (no async, very kludgy…)

        or using a monad (While = Either<X, While>)

        While Fib(int n) => n <= 1 ?
        While.Return((BigInteger) n) :
        from x in Fib(n-1)
        from y in Fib(n-2)
        select x + y;

        a bit better but not as cute looking as the async syntax 🙂

        The need arose when implementing a ZBDD library: (a special kind of DAG)
        – lots of complex mutually recursive operations
        – stack overflows constantly on real examples
        – only practical if fully memoized
        Manual CPS and derecursification was very tedious and error prone.

        An acid test for stack safety could look like:

        async Task Fib(int n, BigInteger a, BigInteger b)
        if (n == 0) return b;
        if (n == 1) return a;
        return await Fib(n – 1, a + b, a);

        which produces a stack overflow on Fib(10000, 1, 0) with the default behavior.

        • Absolutely this is possible; not only can we do “stackless” programming as you describe, but we can in theory do other complex control flows such as multi-shot continuations. I *might* discuss this at some point in my “fixing random” series.

          • It is already truly great to hear that it is possible ! Thanks a lot for your time 🙂
            Just be assured that someone somewhere will really love *if and when* you decide some day to speak about a way to use the CPS transform performed by the compiler as a
            stand alone feature. 🙂

  2. I don’t get condition at all. What is it applied to? To me it looks like a where clause without a sequence on the left. Is the condition applied to the next sample? To the next distribution? To the All on that line is two ints, not distributions or anything to apply.

    Can you please elaborate on how this statement would work or what it would apply too? If that’s a future post that’s fine, I just don’t understand how the language species which distribution the condition is applied to.

    • Good question; I should be more clear in the text. You call a probabilistic function and it returns a distribution object. Calling Sample() on that distribution object should be logically the same as running the body, where all “sample” operations just Sample() their distributions, and if a “condition” operation ever has a false operand, we abandon this attempt to sample and start again from the top of the method. The value sampled is the value that is returned.

      The trick is to implement these semantics without ever actually running the body in a loop, because the conditions might be false 99.9% of the time and then it takes 1000x times too long to run the method. Ideally we want to call the method, and the method *infers* an accurately weighted distribution and returns that.

      • Ah. So the condition applies to the object you are returning, which is basically the function itself. Which means that the condition applies after the return 2. So your implementation would need to see that we need to remove z from TwoDSix unless its 2? If I understand your plan, while its making more sense what the condition does, I still think how it would do it is weird because it sounds like we would be effecting the support of a method (TwoDSix) that should have already run. Logically, looking at the program or debugging it I would think ii already had a value before we got to condition, but for performance the condition should have taken z out of the support. Is that correct or are you thinking the compiler would collapse the method into a single distribution with its own calculated supports and weights based on the methods it samples? If we do that, would it still be possible to debug and step into this method? Maybe I am showing my own lack of knowledge around async/await and existing ways of solving these issues.

        Also, if the condition is always false I guess we return the empty distribution vs throwing? That is probably the point of including it.

        Happy to wait for a future episode if it will all be explained then, just trying to understand this one. 🙂

        • “condition” is a statement, and so it only applies if it is reachable. If the method has already “returned” then it got its sample successfully.

          Here’s another way to think about it. There are finitely many paths through the method starting from the call and going to a return. Enumerate all of them. Now discard all of the ones where any condition on the path was false. The remaining set of possibly-returned values is the support of the distribution, and the number of paths that has each possible return gives the weight.

          You are absolutely right to notice that debugging probabilistic workflows can be nightmarish. Think of all the difficulties of debugging LINQ queries composed of multiple iterator blocks, and multiply that by a big number. I’m not going to discuss that problem much in this series, but it would be a real concern. We had similar real concerns when designing async-await, since many debugger tools rely on things like inspecting the call stack; asynchronous methods don’t have a call stack, because “where the call came from” and “where we’re going to next” are different in asynchronous workflows.

          And yes, you have anticipated why we need the empty distribution; “probabilistic M() { condition false; }” needs to return the empty distribution.

          We’ll cover all this stuff in the next few episodes, but good on you for thinking ahead.

          • Thanks. That makes more sense to me. So logically the statements are applied in order but the result of the function will be a new distribution where the condition was applied to remove any impossible results from the support so we can get our results with a single sample call. And since our weights are integers and not decimals that add up to one, throwing out a support should be pretty simple and won’t effect other weights.

            For debugging I could imagine we show the distribution that would have resulted from each line individually and then the final sample returns the actual number? Or we might have an inner distributions variable that would show the distributions of what we sampled. Might be too tricky to solve but it raises another question in my mind.

            Would these functions be generated at run time or compile time. TwoDSix is the same for every call but Workflow changes at runtime based on z. Either here or in async await, is everything always built at runtime or is there an optimization we can do when compile these sorts of functions without parameters? Or do we just cache it at runtime and use that distribution for any functions built on top of TwoDSix? A cache of distribution could also work for multiple calls to the same function with the same parameter. I know very little about compilers so trying to think this through. Thanks for your responses and this series!

          • The C# compiler will simplify expressions based on known values of constants; for example it will compile x * 0 down to just 0 if x is an integer variable. It will remove code that it knows is unreachable. Other than that, there’s not a lot of “whole method” optimization because the C# compiler does not attempt to identify pure functions. If it did, then yes, some of these distributions could be computed at compile time in theory. But it doesn’t in practice.

      • At least for me, the main source of confusion is that `condition` should be a verb. (It is a verb, but it sounds kinda weird in this context.) `observe` would make a lot more sense to me.

  3. Pingback: Fixing Random, part 21 | Fabulous adventures in coding

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s