Nullable micro-optimizations, part eight

Today, the answer to the puzzle I posed last week. The answer is no, that optimization is not legal in general, though it is legal in a few specific cases.

As I have discussed several times before, C# has very strict rules about what order operations happen in. A quick rule of thumb is: operands run left-to-right. In our example we had X() * Y() + Z(), so let's see what conclusions we reach by applying this rule:

  • X() is an operand of the * to the left of Y(), so the call to X() must happen before the call to Y().
  • X()*Y() is an operand of the + to the left of Z(), so the multiplication must happen before the call to Z().

The compiler must generate code that computes X(), then Y(), then the multiplication, then Z(), and then the addition. But our proposed optimization was

int? r;
int? tempX = X();
int? tempY = Y();
int tempZ = Z();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) :
  new int?();

which computes Z() before the multiplication.

So what's the big deal? Multiplication of integers can throw an exception in a checked context. That exception should prevent Z() from being called in the first place should X() * Y() throw on the multiplication. This optimization is only valid in an unchecked context.

And of course it just gets worse if we start to consider lifted arithmetic on types other than int?. It's a bad practice to write a user-defined operator with a side effect, but it is legal and the C# compiler must ensure that side effects of user-defined operators are observed to happen in the prescribed order.

Rather than try to figure out all the types for which this optimization is legal in various contexts, Roslyn makes no attempt to optimize this kind of binary operator. It generates a temporary int? for the multiplication, and then does the regular lifted arithmetic for the addition. Another lovely optimization spoiled by conflict with an lovely invariant.

But wait!

The problem is that the side effects of the multiplication must be observed to come before the side effects of the right addend. What if the right addend has no side effects? Then we can do this optimization!

The most common situation in which the right addend has no side effects is when it is a constant:

int? r = X() * Y() + 1;

This can legally be optimized to

int? r;
int? tempX = X();
int? tempY = Y();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + 1) :
  new int?();

And in fact I did add this optimization to Roslyn; just as unary operations and conversions can be "distributed", so can binary operations where the right operand is a constant. Moreover, as a pleasant side effect, doing so allowed for an easy implementation of various optimizations that improve the quality of lifted x += 1 and x++ expressions.

Well, that took rather more episodes than I thought it would when I started! I could talk a bit more about how Roslyn optimizes other more exotic nullable arithmetic, like equality, inequality, and logical operations. I could also talk a bit about the stuff I didn't get to implement before I left; Roslyn does a slightly worse job than the original recipe compiler when optimizing expressions where we know that the result is going to be null, but also must compute a side effect. But rather than go down those lengthy rabbit holes, I think I'll call this series done at eight episodes.


Next time on FAIC: Some of these optimizations we've been talking about are elisions; I'll talk a bit about how computer programmers have borrowed this term from linguistics, and how we use it in two different senses.

Nullable micro-optimizations, part seven

Today, a puzzle for you.

We've been talking about how the Roslyn C# compiler aggressively optimizes nested lifted unary operators and conversions by using a clever technique. The compiler realizes the inner operation as a conditional expression with a non-null nullable value on the consequence branch and a null nullable value on the alternative branch, distributes the outer operation to each branch, and then optimizes the branches independently. That then gives a conditional expression that can itself be the target of further optimizations if the nesting is deeper.

This works great for lifted conversions and unary operators. Does it also work for binary operators? It seems like it would be a lot harder to make this optimization work for a lifted binary operator where both operands are themselves lifted operations. But what if just one of the operands was a lifted operation, and the other operand was guaranteed to be non-null? There might be an opportunity to optimize such an expression. Let's try it. Suppose X() and Y() are expressions of type int? and that Z() is an expression of type int:

int? r = X() * Y() + Z();

We know from our previous episodes that operator overload resolution is going to choose lifted multiplication for the inner subexpression, and lifted addition for the outer subexpression. We know that the right operand of the lifted addition will be treated as though it was new int?(Z()), but we can optimize away the unnecessary conversion to int?. So the question is can the C# compiler legally code-generate that as though the user had written:

int? r;
int? tempX = X();
int? tempY = Y();
int tempZ = Z();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) :
  new int?();

If you think the answer is "yes" then the follow-up question is: can the C# compiler legally make such an optimization for all nullable value types that have lifted addition and multiplication operators?

If you think the answer is "no" then the follow-up questions are: why not? and is there any scenario where this sort of optimization is valid?


Next time on FAIC we'll be kind to our fine feathered friends; after that, we'll find out the answer to today's question.


Eric is crazy busy at Coverity's head office; this posting was pre-recorded.

Nullable micro-optimizations, part six

Previously in this series I said that the fact that the original C# compiler pursues a less aggressive strategy for optimizing away temporaries and branches from nested lifted conversions and unary operators because it suffers from "premature optimization". That's a loaded term and I'm not using it in the standard sense, so I want to clarify that a bit.

Donald Knuth, author of the classic four-volume series The Art of Computer Programming, famously said "premature optimization is the root of all evil.". I think however that it is more instructive to read that quotation with more context:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

Which is of course what I have echoed in my numerous performance rants over the years: don't waste your valuable time making risky changes to the 97% of the code that isn't the slowest thing and that no customer will ever notice. Use a profiler, find the slowest 3%, and spend your optimization budget on that.

That is good advice, but when I say that a compiler suffers from "premature optimization", that's not at all what I mean. Rather, I mean that the compiler performs an optimization pass too early in the compilation process.

Back in 2010 I described in considerable detail the various stages, or "passes", that the original recipe C# compiler performs when going from raw text to IL, so you might want to read that. For the purposes of this discussion we can simplify that all down to four stages:

  1. Lexical and grammatical analysis
  2. Initial "binding" -- that is, semantic analysis
  3. "Lowering" -- that is, rewriting high-level code into low-level code -- and additional error detection
  4. Code generation

You would expect that semantic optimizations1 such as lifted arithmetic lowering would happen in the third stage, 2 and for the most part they do.3 The implementation decision which is vexing me today is that the original recipe C# compiler's strategy is that the initial binding pass identifies portions of lifted arithmetic expressions that can be optimized later, and flags them as needing attention during the lowering pass.4

The problem is that the initial binding pass only identifies opportunities for optimization based on the original form of the code. If the optimization pass produces "lowered" code that is itself amenable to further optimization then it is never optimized because there's no flag left in there by the initial binding pass!

To make a long story short -- and yes, this seems to have gotten rather long, sorry -- the practical upshot is that the original recipe compiler is very good at finding "shallow" optimization opportunities on lifted operations, but very bad at making optimizations compose nicely when lifted operations are deeply nested; those tend to generate lots of unnecessary temporaries and branches.

Like I said previously, the compiler is not required to make those optimizations, but it has always vexed me that it does not. Roslyn improves on this situation by deferring all lowering and optimization of lifted arithmetic to the lowering phase; only the bare minimum analysis is performed during the initial binding pass. Roslyn optimizes each lifted arithmetic expression as it lowers it to temporaries and conditionals, and then tries aggressively to "distribute" lifted unary operations and conversions into those generated conditionals in order to skip creation of unnecessary temporaries and branches.


Next time on FAIC: Is it ever possible to apply this optimization technique to a lifted binary operator? I'll pose that question in the form of a puzzle.

Eric is crazy busy at Coverity's head office; this posting was pre-recorded.

  1. I described many of the optimizations that the C# compiler performs back in 2009.
  2. Some optimizations of course happen during the fourth phase, because the code generator itself can identify branches and temporaries that can be eliminated.
  3. For some examples of how premature optimizations in the initial binding pass led to bugs and breaking changes, see my posts from 2006 on that subject. Part one. Part two.
  4. I am over-simplifying here; it is not as simple as a Boolean flag in most cases. In fact, the amount of information that is stored by the initial binding pass for the use of the optimizer later is quite scary because it is easy to accidentally use the wrong bits when lowering. An example of such a bug is in this StackOverflow question.

Nullable micro-optimization, part five

Last time in this series I described how a lifted implicit conversion could be "distributed" to both branches of the conditional operator that is the realization of a lifted arithmetic expression, and then optimized further on each side. Of course, the thing being converted can be any lifted expression in order to take advantage of this optimization. This means that the optimization "composes" nicely; the optimization could be repeatedly applied when lifted operations are nested.

This is a bit of a silly illustrative example: suppose you have expressions x and y of type A? with a lifted addition operator that produces an A?. There's also a lifted conversion from A? to B?, and similarly from B? to C?.

C? c = (C?)(B?)(x + y);

As we discussed previously in this series, the compiler realizes the lifted addition as a conditional expression. We know that the lifted conversion to B? can be "distributed" to the consequence and alternative branches of the conditional expression. That then results in a different conditional expression, but one such that the conversion to C? can be distributed to each branch of that! That is, the compiler could realize the code above as:

C? c;
A? temp1 = x;
A? temp2 = y;
c = temp1.HasValue & temp2.HasValue ? 
  new C?((C)(B)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new C?();

... by applying the optimization twice, rather than creating a temporary of type A? for the sum and a temporary of type B? for the conversion of the sum, each with its own conditional expression. The aim of the optimization is to reduce the number of temporaries and conditional expressions, and thereby make the code smaller and produce fewer basic blocks.

A lifted conversion is rather like a lifted unary operator, and in fact the compiler could do the analogous optimization for the lifted unary +, -, ~ and ! operators. Continuing our silly example, suppose we have a lifted ~ operator on A? that produces an A?. If you said:

C? c = (C?)(B?)~(x + y);

Then the ~ operation can also be "distributed" to each branch of the conditional just as the conversions can be. The insight here is the same as before: if the consequence and alternative are both of the same type then

~(condition ? consequence : alternative)

is the same as

condition ? ~consequence : ~alternative

When we furthermore know that the consequence is of the form new A?(something) then we know that ~consequence is the same as new A?(~something). When we know that the alternative is of the form new A?(), then we know that ~new A?() is going to be a no-op, and just produce new A?() again. So, to make a long story short, the C# compiler can codegen the code above as:

C? c;
A? temp1 = x;
A? temp2 = y;
c = temp1.HasValue & temp2.HasValue ? 
  new C?((C)(B)(~(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) :
  new C?();

Again, we save several temporaries and branches by performing this optimization.

Now, I've been saying "the compiler could" a lot because of course a compiler is not required to perform these optimizations, and in fact, the "original recipe" compiler is not very aggressive about performing these optimizations. I examined the original recipe compiler very closely when implementing nullable arithmetic in Roslyn, and discovered that it suffers from a case of "premature optimization".


Next time on FAIC: We'll digress for the next couple of posts. Then I'll pick up this subject again with a discussion of the evils of "premature optimization" of nullable arithmetic, and how I'm using that loaded term in a subtly different way than Knuth did.

Nullable micro-optimization, part four

Last time on FAIC I described how the C# compiler elides the conversion from int to int? when you add an int? to an int, and thereby manages to save unnecessary calls to HasValue and GetValueOrDefault(). Today I want to talk a bit about another kind of nullable conversion that the compiler can optimize. Consider the following, in which w is an expression of type int? :

double? z = w;

There is an implicit conversion from int to double, and so there is a "lifted" conversion from int? to double?. As I'm sure you'd expect, given the previous entries in this series, this would be code-generated the same as:

double? z;
int? temp = w;
z = temp.HasValue ? 
    new double?((double)temp.GetValueOrDefault()) : 
    new double?();

If you don't know anything more about w then that's about as good as it gets. But suppose we did know more. For example, suppose we have:

double? z = new int?();

That might seem crazy, but bear with me. In this case, obviously the compiler need not ever call HasValue in the first place because you and I both know it is going to be false. And we know that there are no side effects of the expression that need to be preserved, so the compiler can simply generate:

double? z = new double?();

Similarly, suppose we have an expression q of type int, and the assignment:

double? z = new int?(q);

Again, clearly we do not need to go through the rigamarole of making a temporary and checking to see if its HasValue property is true. We can skip straight to:

double? z = new double?((double)q);

So this is all well and good. The Roslyn and "original recipe" C# compilers both perform these optimizations. But now let's think about a trickier case. Suppose we have expressions x and y both of type int?, and suppose for the sake of argument that we do not know anything more about the operands:

double? z = x + y;

Now, reason like the compiler. We do not know whether x and y have values or not, so we need to use the un-optimized version of addition. So this is the same as:

double? z;
int? temp1 = x;
int? temp2 = y;
int? sum = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : 
  new int?();
z = (double?)sum;

We don't know whether sum has a value or not, so we must then generate the full lifted conversion, right? So this is then generated as:

double? z;
int? temp1 = x;
int? temp2 = y;
int? sum = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : 
  new int?();
z = sum.HasValue ? 
  new double?((double)sum.GetValueOrDefault()) :
  new double?()

Is that the best we can do? No! The key insight here is that the conversion can be distributed into the consequence and alternative of the conditional, and that doing so enables more optimizations. That is to say that:

z = (double?) (temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : 
  new int?());

Gives the exact same result as:

z = temp1.HasValue & temp2.HasValue ? 
  (double?) new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : 
  (double?) new int?();

But we already know how to optimize those! I said above that only crazy people would convert new int?() to double?, and of course you would not do that in your user-written code. But when the compiler itself generates that code during an optimization, it can optimize it further. The compiler generates a lifted conversion from a lifted arithmetic expression by distributing the conversion into both branches of the conditional, and then optimizes each branch. Therefore, double? z = x + y; is actually generated as:

double? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ? 
  new double?((double)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) : 
  new double?();

The compiler does not need to generate the sum variable at all, and it certainly does not need to check to see if it has a value. This optimization eliminates one entire temporary and the entire second conditional expression.


Next time on FAIC: We'll digress for some brief news on the publishing front. We'll then continue this series and ask: are there other "chained" lifted operations that can be optimized?

Nullable micro-optimization, part three

Happy New Year all; I hope you had as pleasant a New Year's Eve as I did.

Last time on FAIC I described how the C# compiler first uses overload resolution to find the unique best lifted operator, and then uses a small optimization to safely replace a call to Value with a call to GetValueOrDefault(). The jitter can then generate code that is both smaller and faster. But that's not the only optimization the compiler can perform, not by far. To illustrate, let's take a look at the code you might generate for a binary operator, say, the addition of two expressions of type int?, x and y:

int? z = x + y;

Last time we only talked about unary operators, but binary operators are a straightforward extension. We have to make two temporaries, so as to ensure that side effects are executed only once: 1

int? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

A brief aside: shouldn't that be temp1.HasValue && temp2.HasValue? Both versions give the same result; is the short circuiting one more efficient? Not necessarily! AND-ing together two bools is extremely fast, possibly faster than doing an extra conditional branch to avoid what is going to be an extremely fast property lookup. And the code is certainly smaller. Roslyn uses non-short-circuiting AND, and I seem to recall that the earlier compilers do as well.

Anyway, when you do a lifted addition of two nullable integers, that's the code that the compiler generates when it knows nothing about either operand. Suppose however that you added an expression q of type int? and an expression r of type int 2:

int? s = q + r;

OK, reason like the compiler here. First off, the compiler has to determine what the addition operator means, so it uses overload resolution and discovers that the unique best applicable operator is the lifted integer addition operator. Therefore both operands have to be converted to the operand type expected by the lifted operator, int?. So immediately we have determined that this means:

int? s = q + (int?)r;

Which of course is equivalent to

int? s = q + new int?(r);

And now we have an addition of two nullable integers. We already know how to do that, so the compiler generates:

int? s;
int? temp1 = q;
int? temp2 = new int?(r);
s = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

And of course you are saying to yourself well that's stupid. You and I both know that temp2.HasValue is always going to be true, and that temp2.GetValueOrDefault() is always going to be whatever value r had when the temporary was built. The compiler can optimize this to:

int? s;
int? temp1 = q;
int temp2 = r;
s = temp1.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2) :
  new int?();

Just because the conversion from int to int? is required by the language specification does not mean that the compiler actually has to generate code that does it; rather, all the compiler has to do is generate code that produces the correct results! 3


Next time on FAIC: What happens when we throw some lifted conversions into the mix?

  1. More specifically, the compiler must ensure that side effects are executed exactly once.
  2. Roslyn will also optimize lifted binary operator expressions where both sides are known to be null, where one side is known to be null, and where both sides are known to be non-null. Since these scenarios are rare in user-written code, I'm not going to discuss them much.
  3. A fun fact is that the Roslyn compiler's nullable arithmetic optimizer actually optimizes it to temp1.HasValue & true ? ..., and then Roslyn's regular Boolean arithmetic optimizer gets rid of the unnecessary operator. It was easier to write the code that way than to be super clever in the nullable optimizer.

Nullable micro-optimization, part two

I hope you've all had a pleasant Christmas; I sure did, though once again I was unable to return to Waterloo region to visit my family. Hopefully I'll make it for Easter this coming year.

Last time on FAIC I described why calling GetValueOrDefault() instead of Value allows the jitter to generate smaller, faster code. Of course this optimization is first, tiny, and second, only a valid optimization in the case where you are certain that the nullable value is not actually null. Over the next few episodes I'll describe how the C# compiler uses that fact to generate better code for you, but in order to do that, I first need to talk a bit about lifted arithmetic.

Back in 2007 I described what mathematicians mean by "lifted arithmetic", and how the C# specification uses this term in a subtly wrong way. It's been a long time, so here's a quick refresher. Mathematically, by "lifted" we mean that if there is a function f : S → S, and we make a new set S' = S ∪ { null }, then the lifted function f' : S' → S' is defined as f'(null) → null, f'(s ∈ S) → f(s). Or, in English, the lifted function gives null when given null, and agrees with the unlifted function otherwise.1

Lifted arithmetic operators in C# work similarly. In C#, if there is an operator, let's say the unary ~ operator that takes an int and produces an int, then there is also a lifted ~ operator that takes an int? and produces an int?. The lifted operator produces null if given null, and otherwise agrees with the unlifted operator.

Some so-called "lifted" operators do not follow this pattern, but for the purposes of this series we'll mostly be talking about the ones that do.


I want to make a brief aside here to discuss how the C# compiler knows to use a lifted operator in the first place. The answer is straightforward: it uses overload resolution. Continuing our example, when you say ~x, the compiler pretends that you did a method call operator~(x) and creates a candidate set that consists of "methods" corresponding to the signatures of the user-defined and built-in ~ operators. If overload resolution produces a unique best applicable operator then it is chosen and the operand is implicitly converted to the "parameter type" of the chosen "operator method", otherwise the compiler produces an error. That's an oversimplification; consult the specification for the exact details.2 The important fact that will come into play later in this series is that if overload resolution chooses a lifted operator then the operand is implicitly converted to the nullable type. Just like how when normal overload resolution chooses a method, the arguments are implicitly converted to the corresponding formal parameter types.


Returning now to the subject at hand: how does the C# compiler generate code for a lifted operator? When you say:

int? y = ~x;

what happens? Let's suppose that x is a legal expression of type int?, just to keep it easy. Overload resolution determines that the lifted ~ operator that takes an int? and produces an int? is the unique best applicable operator. The expression is already of the correct type. Now, you might naively think that the compiler would pretend that you'd typed:

int? y = x.HasValue ? ~x.Value : null;

but of course that code is wrong in two ways. First, it doesn't compile because the type of the conditional operator expression cannot be determined.3 And second, what if the expression x has a side effect? We would not want to generate int? y = ~M(i++); as:

int? y = M(++i).HasValue ? ~M(++i).Value : null;

because then the variable gets incremented twice and the method gets called twice if the result of the first call is not null. And of course the value returned the second time might be different! We can fix these two problems easily enough:

int? y;
int? temp = x;
y = temp.HasValue ? new int?(~temp.Value) : new int?();

And now we're good.

At this point the C# compiler can say "but wait a moment! if we are on the "consequence" branch of the conditional operator then we know for sure that temp.HasValue is true. Therefore the compiler can generate the more optimal code:

int? y;
int? temp = x;
y = temp.HasValue ? new int?(~temp.GetValueOrDefault()) : new int?();

Which is in fact what both the "original recipe" and the "extra crispy Roslyn" compilers do. The savings is tiny, but it is real, and these savings add up as the expressions get more and more complicated, as we'll see.


Next time on FAIC: Is that the only optimization a C# compiler can perform when generating code for lifted arithmetic? Of course not! In the next few episodes we'll look at some ways the compiler can be more clever, and compare the Roslyn compiler's heuristics to the previous compiler's heuristics. Happy New Year all, and we'll see you in 2013 for more fabulous adventures.

  1. We then extend the definition of "lifted" to functions of the form f : S → T in the obvious manner: the lifted function is f' : S' → T'. Similarly for functions of two, three or more parameters: the lifted function is null if any argument is null, and agrees with the unlifted function otherwise.
  2. Unfortunately, the specification sections on operator overload resolution are not strictly speaking entirely accurate: there are some known discrepancies between the compiler and the specification. In some of these cases the compiler is wrong and in some the specification is wrong. The areas with small discrepancies include (1) precisely when a user-defined operator is considered to be "liftable" and what the resulting semantics are, (2) how the candidate set for operators on enumerated and delegate types are determined, and (3) how the "betterness" rules treat lifted operators. Mads and I have a number of times attempted to come up with better spec language but I don't think the proposed changes made it into the latest revision. I might choose to do blog articles on these interesting and difficult corner cases in the future.
  3. Astonishingly, I've never written a blog article about this specific aspect of the conditional operator, though it has certainly come up on StackOverflow a lot. This is probably the blog article that came the closest to describing this common problem.

Nullable micro-optimizations, part one

Which is faster, Nullable<T>.Value or Nullable<T>.GetValueOrDefault()?

Before I answer that question, my standard response to "which horse is faster?" questions applies. Read that first.

.
.
.

Welcome back. But again, before I answer the question I need to point out that the potential performance difference between these two mechanisms for obtaining the non-nullable value of a nullable value type is a consequence of the fact that these two mechanisms are not semantically equivalent. The former may legally only be called if you are sure that the nullable value is non-null;1 the latter may be called on any nullable value. A glance at a simplified version of the source code illustrates the difference.

struct Nullable<T> where T : struct
{
  private bool hasValue;
  private T value;
  public Nullable(T value)
  {
    this.hasValue = true;
    this.value = value;
  }
  public bool HasValue { get { return this.hasValue; } }
  public T Value
  {
    get
    {
      if (!this.HasValue) throw something;
      return this.value;
    }
  }
  public T GetValueOrDefault() 
  {
    return this.value; 
  }
  ... and then all the other conversion gear and so on ...
}

The first thing to notice is that a nullable value type's ability to represent a "null" integer or decimal or whatever is not magical.2 A nullable value type is nothing more than an instance of the value type plus a bool saying whether it's null or not.

If a variable of nullable value type is initialized with the default constructor then the hasValue field will be its default value, false, and the value field will be default(T). If it is initialized with the declared constructor then of course the hasValue field is true and the value field is any legal value, including possibly T's default value. Thus, the implementation of GetValueOrDefault() need not check the flag; if the flag is true then the value field is set correctly, and if it is false, then it is set to the default value of T.

Looking at the code it should be clear that Value is almost certainly not faster than GetValueOrDefault() because obviously the former does exactly the same work as the latter in the success case, plus the additional work of the flag check. Moreover, because GetValueOrDefault() is so brain-dead simple, the jitter is highly likely to perform an inlining optimization.3 How the jitter chooses to inline or not is an implementation detail, but it is reasonable to assume that it is less likely to perform an inlining optimization on code that contains more than one "basic block"4 and explicitly throws.

It should also be clear that though the relative performance difference might be large, the absolute difference is small. A call, field fetch, conditional jump and return in the typical case makes up the difference, and those things are each only nanoseconds.

Now, this is of course not to say that you should willy-nilly change all your calls to Value to GetValueOrDefault() for performance reasons. Read my rant again if you have the urge to do that! Don't go changing working, debugged, tested code in order to obtain a performance benefit that is (1) highly unlikely to be a real bottleneck, and (2) highly unlikely to be your worst performance problem.

And besides, using Value has the nice property that if you have made a mistake and fetched the value of a null, you'll get an exception that informs you of where your bug is! Code that draws attention to its faults is a good thing.5


Next time on FAIC: How does the C# compiler use its knowledge of the facts discussed today to your advantage? Have a great Christmas everyone; we'll pick up this subject again in a week.

  1. Put another way, calling Value without knowing that HasValue is true is a boneheaded exception.
  2. Nullable value types are magical in other ways; for example, there's no way to write your own struct that has the strange boxing behaviour of a nullable value type; an int? boxes to either an int or null, never to a boxed int?.
  3. An inlining optimization is where the jitter eliminates an unnecessary "call" and "return" instruction by simply generating the code of the method body "inline" in the caller. This is a great optimization because doing so can make code both smaller and faster in some cases, though it does make it harder to debug because the debugger has no good way to generate breakpoints inside the inlined method.
  4. A "basic block" is a region of code where you know that the code will execute from the top of the block to the bottom without any "normal" branches in or out of the middle of the block. (A basic block may of course have exceptions thrown out of it.) Many optimizing compilers use "basic blocks" as an abstraction because it abstracts away the unnecessary details of what the block actually does, and treats it solely as a node in a flow control graph.
  5. Note that here we have one of those rare cases where the frameworks design guidelines have been deliberately bent. We have a "Get" method is actually faster than a property getter, and the property getter throws! Normally you expect the opposite: the "Get" method is usually the one that is slow and can throw, and the property is the one that is fast and never throws. Though this is somewhat unfortunate, remember, the design guidelines are our servants, not our masters, and they are guidelines, not rules.

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T.1 Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.

Many people find this restriction grating; they expect to be able to cast a boxed thing to anything that the unboxed thing could have been cast to. There are ways to do that, as we’ll see, but there are good reasons why the cast operator does what it does.

To understand why this design works this way it’s necessary to first wrap your head around the contradiction that is the cast operator. There are two2 basic usages of the cast operator in C#:

  • My code has an expression of type B, but I happen to have more information than the compiler does. I claim to know for certain that at runtime, this object of type B will actually always be of derived type D. I will inform the compiler of this claim by inserting a cast to D on the expression. Since the compiler probably cannot verify my claim, the compiler might ensure its veracity by inserting a run-time check at the point where I make the claim. If my claim turns out to be inaccurate, the CLR will throw an exception.
  • I have an expression of some type T which I know for certain is not of type U. However, I have a well-known way of associating some or all values of T with an “equivalent” value of U. I will instruct the compiler to generate code that implements this operation by inserting a cast to U. (And if at runtime there turns out to be no equivalent value of U for the particular T I’ve got, again we throw an exception.)

The attentive reader will have noticed that these are opposites. A neat trick, to have an operator which means two contradictory things, don’t you think?

This dichotomy motivates yet another classification scheme for conversions.3 We can divide conversions intorepresentation-preserving conversions (B to D) and representation-changing conversions (T to U).4 We can think of representation-preserving conversions on reference types as those conversions which preserve the identity of the object. When you cast a B to a D, you’re not doing anything to the existing object; you’re merely verifying that it is actually the type you say it is, and moving on. The identity of the object and the bits which represent the reference stay the same. But when you cast an int to a double, the resulting bits are very different.

All the built-in reference conversions are identity-preserving.5 Obviously trivial “conversions” such as converting from int to int are also representation-preserving conversions. All user-defined conversions6 and non-trivial value type conversions (such as converting from int to double) are representation-changing conversions. Boxing and unboxing conversions are all representation-changing conversions.

The representation-preserving conversions that are known to never fail often result in no codegen at all.7 If a representation-preserving conversion could fail then a castclass instruction is emitted, which does a runtime check and throws if the check fails.

But each representation-changing conversion is handled in its own special way. User-defined conversions are resolved using a special version of the overload resolution algorithm, and generated as a call to the appropriate static method. Boxing and unboxing conversions are generated as box and unbox instructions. All the other built-in conversions (int to double, and so on) are generated as custom sequences of instructions that do the right conversion.

So now that you know that, consider what the compiler would have to do to make this work the way some people expect:

int kkk = (int) ooo;

All that the compiler knows is that ooo is of type object. It could be anything. Suppose it is a boxed int – then the compiler should generate an unboxing instruction. Suppose it is a boxed short. Then the compiler should unbox the short and then generate the custom sequence of instructions that convert a short to an int. Suppose it is a boxed double – same thing, but different instructions. And so on, for all the built-in conversions that go to int.

This would be a huge amount of code to generate, and it would be very slow. The code is of course so large that you would want to put it in its own method and just generate a call to it. Rather than do that by default, and always generate code that is slow, large and fragile, instead we’ve decided that unboxing can only unbox to the exact type. If you want to call the slow method that does all that goo, it’s available – you can always call Convert.ToInt32, which does all that analysis at runtime for you. We give you the choice between “fast and precise” or “slow and lax”, and the sensible default is the former. If you want the latter then call the method.

That’s just the built-in conversions. Let’s continue imagining what would have to happen if we wanted all possible conversions to int to just work out correctly at runtime, instead of just bailing out early if the boxed thing is not an int.

Suppose the object is a Foo where there is a user-defined conversion from Foo (or one of its base classes) to int (or a type that int is explicitly convertible from, like, say, Nullable<int>). Then the compiler would need to generate a call to that conversion method, just as it would if the type had been known at compile time, and then possibly also generate the conversion from the return type of the method to int.

Remember, there could be arbitrarily many such conversion methods on arbitrarily many types. The type Foo and its conversion method might not even be defined in the assembly currently being compiled or any assembly referenced. Therefore the compiler would have to generate code to interrogate Foo at runtime, do the overload resolution analysis, and then dynamically spit the code to do the call.

Which is exactly what the compiler does in C# 4.0 if the argument to the cast is of type dynamic instead of objectThe compiler actually generates code which starts a mini version of the compiler up again at runtime, does all that analysis, and spits fresh code. This is not fast, but it is accurate, if that’s what you really need. (And the spit code is then cached so that the next time this call site is hit, it is much faster.)

I don’t think people really expect the compiler to start up again at runtime every time they cast an object to int; I think they just haven’t thought through carefully exactly how much analysis solving the problem would take. Rather a lot, it turns out.

  1. Or Nullable<T>.
  2. There are others that are not germane to this discussion. For example, a third usage is “Everyone knows that this D is also of base type B; I want the compiler to treat this expression of type D as a B for overload resolution purposes.” That would clearly be an identity-preserving conversion.
  3. There are many ways to classify conversions; we already divide conversions into implicit/explicit, built-in/user-defined, and so on. For the purposes of this discussion we’ll gloss over the details of those other classifications.
  4. I’m glossing over here that certain conversions that the C# compiler thinks of as representation-changing are actually seen by the CLR verifier as representation-preserving. For example, the conversion from int to uint is seen by the CLR as representation-preserving because the 32 bits of a signed integer can be reinterpreted as an unsigned integer without changing the bits. These cases can be subtle and complex, and often have an impact on covariance-related issues; see next footnote. I’m also ignoring conversions involving generic type parameters which are not known at compile time to be reference or value types. There are special rules for classifying those which would be major digressions to get into.
  5. This is why covariant and contravariant conversions of interface and delegate types require that all varying type arguments be of reference types. To ensure that a variant reference conversion is always identity-preserving, all of the conversions involving type arguments must also be identity-preserving. The easiest way to ensure that all the non-trivial conversions on type arguments are identity-preserving is to restrict them to be reference conversions.
  6. The rules of C# prohibit all user-defined conversions that could possibly be identity-preserving coercions. More generally, all user-defined conversions that could possibly be any "standard" conversion are illegal.
  7. Again, I’m ignoring irksome generic issues here. There are situations where humans can prove mathematically that two generic type parameters must be identical at runtime, but the verifier is not smart enough to make those same deductions and requires the compiler to emit type checks.