Monads, part one

Lots of other bloggers have attempted this, but what the heck, I’ll give it a shot too. In this series I’m going to attempt to answer the question:

I’m a C# programmer with no “functional programming” background whatsoever. What is this “monad” thing I keep hearing about, and what use is it to me?

Bloggers often attempt to tackle this problem by jumping straight into the functional programming usage of the term, and start talking about “bind” and “unit” operations, and higher-order functional programming with higher-order types. Even worse is to go all the way back to the category theory underpinning monads and start talking about “monoids in the category endofunctors” and the like. I want to start from a much more pragmatic, object-oriented, C#-type-system focussed place and move towards the rarefied heights of functional programming as we go. Continue reading

Static constructors, part four

We’ll finish up this series on static constructors by finally getting to the question which motivated me to write the series in the first place: should you use a static constructor like this?

public class Sensitive
{
  static Sensitive()
  {
    VerifyUserHasPermissionToUseThisClass();
  }  
  public static void Dangerous()
  {
    DoSomethingDangerous();
  }
  ...

The intention here is clear. The static constructor is guaranteed to run exactly once, and before any static or instance method. Therefore it will do the authorization check before any dangerous operation in any method. If the user does not have permission to use the class then the class will not even load. If they do, then the expense of the security check is only incurred once per execution, no matter how many methods are called.

If you’ve read the rest of this series, or anything I’ve written on security before, you know what I’m going to say: I strongly recommend that you do not use static constructors “off label” in this manner.

First off, as we’ve seen so far in this series, static constructors are a dangerous place to run fancy code. If they end up delegating any work to other threads then deadlocks can easily result. If they take a long time and are accessed from multiple threads, contention can result. If an exception is thrown out of the static constructor then you have very little ability to recover from the exception and the type will be forever useless in this appdomain.

Second, the security semantics here are deeply troubling. If this code is running on the user’s machine then this appears to be a case of the developer not trusting the user. But the .NET security system was designed with the principle that the user is the source of trust decisions. It is the user who must trust the developer, not the other way around! If the user is hostile towards the developer then nothing is stopping them from decompiling the code to IL, removing the static constructor, recompiling, and running the dangerous method without the security check. (The resulting program will of course not be strong-named or code-signed by the developer anymore, but who cares?)

Moreover, the pattern here assumes that security checks can be performed once and the result is then cached for the lifetime of the appdomain. What if the initial security check fails, but the program was going to impersonate a more trusted user? It might be difficult to ensure that the static constructor does not run until after the impersonation. What if different threads are associated with different users? Now we have a race to see which user’s context is used for the security check. What if a user’s permissions are revoked while the program is running? The check might be performed while permission is granted, and then the dangerous code runs after it has been revoked.

In short: Static constructors should be used to quickly initialize important static data, and that’s pretty much it. The only time that I would use the mechanisms of a static constructor to enforce an invariant would be for a very simple invariant like ensuring that a singleton is lazily initialized, as Jon describes. Complex policy mechanisms like security checks should probably use some other mechanism.


Next time on FAIC: I’m going to join the throngs of tech bloggers who have tried to explain what a monad is.

Static constructors, part three

Earlier in this series I recommended that you brush up on how instance constructors work; if you did, then you’ll recall that instance field initializers are essentially moved into the beginning of an instance constructor at a point before the call to the base class constructor. You might think that static field initializers work the same: a static field initializer is silently inserted at the beginning of the static constructor. And that’s true. Well, it is mostly true.

In the case where there is already a static constructor, even an empty one, that’s what happens: the field initializers become the prologue to the body of the static constructor, and the usual rules for static constructors then apply. (That is, the static constructor is invoked immediately before the first static member access or instance constructor access.) But suppose there is no user-supplied static constructor. Then what happens?

The C# compiler is not bound by the rules of static constructors in this case, and in fact, does not treat your program as though there was an empty static constructor that has static field initializers in it. Rather, it tells the runtime that the runtime may choose when to run static field initializers, entirely at its discretion, just so long as all the fields are initialized before they are used.

The runtime is still responsible for ensuring that fields are initialized in the right order, because one static field might be initialized based on the contents of another. This is a bad idea, but it is legal, so the compiler has to honour that.

In this scenario the runtime is permitted to run static field initializers as late as possible; it could wait until a static field is actually accessed, rather than waiting for any static member or instance constructor to be accessed. The runtime is also permitted to run static field initializers as early as possible; it could decide to run all the field initializers at once at the beginning of the program, even if the class in question was never used. It is up to the runtime implementation to decide.

Pre .NET 4 implementations of the runtime make an interesting choice; they run static field initializers of classes that have no static constructors when the first method that refers to the class is jitted. If we have:

class Alpha 
{
  static int x = 123;
  static void M() { }
}
class Bravo
{
  static int y = 456;
  static void N() { }
}
class Charlie
{
  static void Q(bool b)
  {
    if (b) Alpha.M(); else Bravo.N();
  }
  static void Main() 
  { 
    Q(true); 
  }
}

Then when Q is jitted, the field initializers for both x and y will be executed. If Bravo had a static constructor then it would not be initialized, because Bravo’s static constructor is only triggered by the actual execution of the call.

The runtime makes this rather odd-seeming choice as an optimization; this way it doesn’t have to generate code on every static member access that checks to see if the field initializers have run yet! It knows that if you get as far as accessing a static member, then the method that does so must have been jitted, and therefore the field initializers have been run already.

I originally believed this to be the case in .NET 4 as well, but Jon informs me that current versions of the runtime are even lazier than that. See Jon’s comment below for details. (Thanks Jon!)

I find this to be one of the strangest C#/CLR features and for many years I did not understand it at all well — and, since Jon has corrected my understanding of it, apparently I still don’t! Every time I encountered a question about it, I just referred the questioner to Jon’s excellent page on the subject. For more information and discussion on this rather odd feature, check it out.


Next time on FAIC: We’ll finish up this series by looking at an abuse of static constructors.

Static constructors, part two

Previously on FAIC I gave a quick overview of the basic operation of a static constructor. Today, three unusual corner cases.

The first odd case I want to talk about involves static methods. Take a look at the sample program from last time. Now suppose we edited the Main method to say:

static void Main() 
{
  D.M();
}

First off, is that even legal? Sure! Inheritance means that all inheritable members of B are also members of D. M is an inheritable member of B, so it is a member of D, right?

Unfortunately, this corner case is the one that exposes the leaky abstraction. The compiler generates code as though you had said B.M();, and therefore D’s static constructor is not called even though “a member of D” has been invoked. This actually makes a fair amount of sense. The method B.M is going to be called, and there’s no reason to go to all the work of running D’s static constructor when B.M probably does not depend on any work done by D’s constructor. And it would seem strange if calling the same method by two different syntaxes would result in different static constructor invocations.

Now let’s consider a second case involving static method invocation. Suppose now we edited Main to say:

static void Main() 
{
  D.N();
}

Clearly D’s static constructor must be invoked. What about B? Is its static constructor invoked? No! A static constructor is triggered by a usage of a static member, or by the creation of an instance. Invoking D.N does not use any static member of B and it does not create an instance of B, so B’s static constructor is not invoked. People sometimes expect that static constructors of base classes will always be invoked before static constructors of derived classes, but that’s not the case.

Our third odd case is: what happens when a static constructor throws an exception?

Absolutely nothing good! First off, of course if the exception goes unhandled then all bets are off. The runtime is permitted to do anything it likes if there is an unhandled exception, including such options as starting up a debugger, terminating the appdomain immediately, terminating the application after running finally blocks, and so on. And an exception in a static constructor can easily go unhandled; trying to wrap every possible first usage of a type with a try-catch block is onerous.

And even if by some miracle the exception gets handled the first time, odds are very good that your program is now in such a damaged state that it is going to go down in flames soon. Remember, I said that a static constructor runs once, and by that I meant once; if it throws, you don’t get a second chance. Instead, when a static constructor terminates abnormally, the runtime marks the type as unusable, and every attempt by your program to use that type results in another exception.

An interesting fact about static constructors that throw exceptions is that when the runtime detects that a static constructor has terminated abnormally, it wraps the exception in its own exception and throws that instead. Check out this StackOverflow answer, where Jon demonstrates this in action.


Next time on FAIC: I’ll defer to Jon again when I discuss how the runtime is permitted to optimize some static constructors.

Static constructors, part one

Previously on FAIC we saw how easy it was to deadlock a program by trying to do something interesting in a static constructor.


Jargon note: Static constructors are also called “class constructors”. Since the actual method generated has the name .cctor they are often also called “cctors”. Since “static constructor” is the jargon used in the C# specification, that’s what I’ll stick to.


Static constructors and destructors are the two really weird kinds of methods, and you should do as little as possible in them.

Before I expound further on that topic though, a look at how static constructors work is in order. And before I do that, it’s probably a good idea that you get a refresher on how instance constructors work. My article “Why do initializers run in the opposite order of constructors?” provides a detailed look at constructor semantics, so maybe check that out if you have a few minutes. Part one is here and part two is here.

OK, now that you know how instance constructors work, let’s dig into static constructors. The idea is pretty simple: a static constructor is triggered to run immediately before the first static method on its class is called, or immediately before the first instance of its class is created. As we saw previously, the runtime tracks when a static constructor is “in flight” and uses that mechanism to ensure that each static constructor is invoked no more than once.

Now that you know all of that, you can predict the output of this simple program:

using System;
class B
{
  static B() { Console.WriteLine("B cctor"); }
  public B() { Console.WriteLine("B ctor"); }
  public static void M() { Console.WriteLine("B.M"); }
}
class D : B
{
  static D() { Console.WriteLine("D cctor"); }
  public D() { Console.WriteLine("D ctor"); }
  public static void N() { Console.WriteLine("D.N"); }
}
class P 
{
  static void Main()
  {
    System.Console.WriteLine("Main");
    new D();
  }  
}

We know that B’s instance constructor must be invoked before D’s instance constructor, and we know that D’s static constructor must be invoked before D’s instance constructor. The only interesting question here is “when will B’s static constructor be invoked?” An instance of D is also an instance of B, so B’s static constructor has to be invoked at some point.

As you know from reading my article on instance constructors, what actually happens is that the compiler generates D’s instance constructor so that the first thing it does is call B’s instance constructor; that’s how we get the appearance that B’s instance constructor runs first. Thus, the actual order of events here can be best conceptualized like this:

  • Main starts. It prints out its message and then tries to invoke D’s instance constructor on a new instance of D.
  • The runtime detects that D’s instance constructor is about to be invoked, so it invokes D’s static constructor.
  • D’s instance constructor invokes B’s instance constructor. The runtime detects that, so it invokes B’s static constructor.
  • B’s instance constructor runs and returns control to D’s instance constructor, which finishes normally.

Pretty straightforward. Let’s mix it up a little.


Next time on FAIC: A brief digression for fun on a Friday. Then next week we’ll resume this series and take a look at a few less straightforward cases.

The no-lock deadlock

People sometimes ask me if there is a cheap-and-easy way to guarantee thread safety. For example, “if my method only reads and writes local variables and parameters, can I guarantee that my method is threadsafe?” Questions like that are dangerous because they are predicated on an incorrect assumption: that if every method of a program is “threadsafe”, whatever that means, then the entire program is “threadsafe”. I might not be entirely clear on what “threadsafe” means, but I do know one thing about it: thread safety is a property of entire programs, not of individual methods.

To illustrate why these sorts of questions are non-starters, today I present to you the world’s simplest deadlocking C# program:

class C
{
  static C() 
  {
    // Let's run the initialization on another thread!
    var thread = new System.Threading.Thread(Initialize);
    thread.Start();
    thread.Join();
  }
  static void Initialize() { }
  static void Main() { }
}

(Thanks to my Roslyn compiler team colleague Neal Gafter for this example, which was adapted from his book Java Puzzlers.)

At first glance clearly ever method of this incredibly simple program is “threadsafe”. There is only a single variable anywhere in the program; it is local, is written once, is written before it is read, is read from the same thread it was written on, and is guaranteed to be atomic. There are apparently no locks anywhere in the program, and so there are no lock ordering inversions. Two of the three methods are empty. And yet this program deadlocks with 100% certainty; the program “globally” is clearly not threadsafe, despite all those nice “local” properties. You can build a hollow house out of solid bricks; so too you can build a deadlocking program out of threadsafe methods.

The reason why this deadlocks is a consequence of the rules for static constructors in C#; the important rule is that a static constructor runs exactly zero or one times, and runs before a static method call or instance creation in its type. Therefore the static constructor of C must run to completion before Main starts. The CLR notes that C‘s static constructor is “in flight” on the main thread and calls it. The static constructor then starts up a new thread. When that thread starts, the CLR sees that a static method is about to be called on a type whose static constructor is “in flight” another thread. It immediately blocks the new thread so that the Initialize method will not start until the main thread finishes running the class constructor. The main thread blocks itself waiting for the new thread to complete, and now we have two threads each waiting for the other to complete.


Next time on FAIC: We’re opening up the new Coverity office in Seattle! After which, we’ll take a closer look at the uses and abuses of the static constructor.

Integer division that rounds up

Integer arithmetic is surprisingly difficult to get right. There are usually lots of weird corner cases and the temptation to use too-clever bit-twiddling tricks is high. Today I thought I’d give an example of how I approach solving integer arithmetic problems on those rare occasions when I have to. So let’s consider the problem of writing a C# method that divides two 32 bit integers such that the division rounds up if it is inexact. Continue reading

Nullable micro-optimizations, part eight

Today, the answer to the puzzle I posed last week. The answer is no, that optimization is not legal in general, though it is legal in a few specific cases.

As I have discussed several times before, C# has very strict rules about what order operations happen in. A quick rule of thumb is: operands run left-to-right. In our example we had X() * Y() + Z(), so let’s see what conclusions we reach by applying this rule:

  • X() is an operand of the * to the left of Y(), so the call to X() must happen before the call to Y().
  • X()*Y() is an operand of the + to the left of Z(), so the multiplication must happen before the call to Z().

The compiler must generate code that computes X(), then Y(), then the multiplication, then Z(), and then the addition. But our proposed optimization was

int? r;
int? tempX = X();
int? tempY = Y();
int tempZ = Z();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) :
  new int?();

which computes Z() before the multiplication.

So what’s the big deal? Multiplication of integers can throw an exception in a checked context. That exception should prevent Z() from being called in the first place should X() * Y() throw on the multiplication. This optimization is only valid in an unchecked context.

And of course it just gets worse if we start to consider lifted arithmetic on types other than int?. It’s a bad practice to write a user-defined operator with a side effect, but it is legal and the C# compiler must ensure that side effects of user-defined operators are observed to happen in the prescribed order.

Rather than try to figure out all the types for which this optimization is legal in various contexts, Roslyn makes no attempt to optimize this kind of binary operator. It generates a temporary int? for the multiplication, and then does the regular lifted arithmetic for the addition. Another lovely optimization spoiled by conflict with an lovely invariant.

But wait!

The problem is that the side effects of the multiplication must be observed to come before the side effects of the right addend. What if the right addend has no side effects? Then we can do this optimization!

The most common situation in which the right addend has no side effects is when it is a constant:

int? r = X() * Y() + 1;

This can legally be optimized to

int? r;
int? tempX = X();
int? tempY = Y();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + 1) :
  new int?();

And in fact I did add this optimization to Roslyn; just as unary operations and conversions can be “distributed”, so can binary operations where the right operand is a constant. Moreover, as a pleasant side effect, doing so allowed for an easy implementation of various optimizations that improve the quality of lifted x += 1 and x++ expressions.

Well, that took rather more episodes than I thought it would when I started! I could talk a bit more about how Roslyn optimizes other more exotic nullable arithmetic, like equality, inequality, and logical operations. I could also talk a bit about the stuff I didn’t get to implement before I left; Roslyn does a slightly worse job than the original recipe compiler when optimizing expressions where we know that the result is going to be null, but also must compute a side effect. But rather than go down those lengthy rabbit holes, I think I’ll call this series done at eight episodes.


Next time on FAIC: Some of these optimizations we’ve been talking about are elisions; I’ll talk a bit about how computer programmers have borrowed this term from linguistics, and how we use it in two different senses.

Nullable micro-optimizations, part seven

Today, a puzzle for you.

We’ve been talking about how the Roslyn C# compiler aggressively optimizes nested lifted unary operators and conversions by using a clever technique. The compiler realizes the inner operation as a conditional expression with a non-null nullable value on the consequence branch and a null nullable value on the alternative branch, distributes the outer operation to each branch, and then optimizes the branches independently. That then gives a conditional expression that can itself be the target of further optimizations if the nesting is deeper.

This works great for lifted conversions and unary operators. Does it also work for binary operators? It seems like it would be a lot harder to make this optimization work for a lifted binary operator where both operands are themselves lifted operations. But what if just one of the operands was a lifted operation, and the other operand was guaranteed to be non-null? There might be an opportunity to optimize such an expression. Let’s try it. Suppose X() and Y() are expressions of type int? and that Z() is an expression of type int:

int? r = X() * Y() + Z();

We know from our previous episodes that operator overload resolution is going to choose lifted multiplication for the inner subexpression, and lifted addition for the outer subexpression. We know that the right operand of the lifted addition will be treated as though it was new int?(Z()), but we can optimize away the unnecessary conversion to int?. So the question is can the C# compiler legally code-generate that as though the user had written:

int? r;
int? tempX = X();
int? tempY = Y();
int tempZ = Z();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) :
  new int?();

If you think the answer is “yes” then the follow-up question is: can the C# compiler legally make such an optimization for all nullable value types that have lifted addition and multiplication operators?

If you think the answer is “no” then the follow-up questions are: why not? and is there any scenario where this sort of optimization is valid?


Next time on FAIC we’ll be kind to our fine feathered friends; after that, we’ll find out the answer to today’s question.


Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.

Nullable micro-optimizations, part six

Previously in this series I said that the fact that the original C# compiler pursues a less aggressive strategy for optimizing away temporaries and branches from nested lifted conversions and unary operators because it suffers from “premature optimization”. That’s a loaded term and I’m not using it in the standard sense, so I want to clarify that a bit.

Donald Knuth, author of the classic four-volume series The Art of Computer Programming, famously said “premature optimization is the root of all evil.“. I think however that it is more instructive to read that quotation with more context:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

Which is of course what I have echoed in my numerous performance rants over the years: don’t waste your valuable time making risky changes to the 97% of the code that isn’t the slowest thing and that no customer will ever notice. Use a profiler, find the slowest 3%, and spend your optimization budget on that.

That is good advice, but when I say that a compiler suffers from “premature optimization”, that’s not at all what I mean. Rather, I mean that the compiler performs an optimization pass too early in the compilation process.

Back in 2010 I described in considerable detail the various stages, or “passes”, that the original recipe C# compiler performs when going from raw text to IL, so you might want to read that. For the purposes of this discussion we can simplify that all down to four stages:

  1. Lexical and grammatical analysis
  2. Initial “binding” — that is, semantic analysis
  3. “Lowering” — that is, rewriting high-level code into low-level code — and additional error detection
  4. Code generation

You would expect that semantic optimizations like the ones I described back in 2009  such as lifted arithmetic lowering would happen in the third stage. Some optimizations of course happen during the fourth stage, because the code generator itself can identify branches and temporaries that can be eliminated. But most happen in the third stage.

Doing an optimization in the wrong stage can introduce correctness problems; for some examples of how premature optimizations in the initial binding pass led to bugs and breaking changes, see my posts from 2006 on that subject. Part one. Part two.

Today I’m not concerned about correctness; I’m concerned about how complete the optimization is. The implementation decision which is vexing me today is that the original recipe C# compiler’s strategy is that the initial binding pass identifies portions of lifted arithmetic expressions that can be optimized later, and flags them as needing attention during the lowering pass, which is where the optimizations are done.

I am over-simplifying here; it is not as simple as a Boolean flag in most cases. In fact, the amount of information that is stored by the initial binding pass for the use of the optimizer later is quite scary because it is easy to accidentally use the wrong bits when lowering. An example of such a bug is in this StackOverflow question. But we can think of it logically as a flag.

The problem is that the initial binding pass only identifies opportunities for optimization based on the original form of the code. If the optimization pass produces “lowered” code that is itself amenable to further optimization then it is never optimized because there’s no flag left in there by the initial binding pass! Deciding whether something could benefit from optimization was being done too soon.

To make a long story short — and yes, this seems to have gotten rather long, sorry — the practical upshot is that the original recipe compiler is very good at finding “shallow” optimization opportunities on lifted operations, but very bad at making optimizations compose nicely when lifted operations are deeply nested; those tend to generate lots of unnecessary temporaries and branches.

Like I said previously, the compiler is not required to make those optimizations, but it has always vexed me that it does not. Roslyn improves on this situation by deferring all lowering and optimization of lifted arithmetic to the lowering phase; only the bare minimum analysis is performed during the initial binding pass. Roslyn optimizes each lifted arithmetic expression as it lowers it to temporaries and conditionals, and then tries aggressively to “distribute” lifted unary operations and conversions into those generated conditionals in order to skip creation of unnecessary temporaries and branches.


Next time on FAIC: Is it ever possible to apply this optimization technique to a lifted binary operator? I’ll pose that question in the form of a puzzle.


Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.