Nullable micro-optimization, part five

Last time in this series I described how a lifted implicit conversion could be “distributed” to both branches of the conditional operator that is the realization of a lifted arithmetic expression, and then optimized further on each side. Of course, the thing being converted can be any lifted expression in order to take advantage of this optimization. This means that the optimization “composes” nicely; the optimization could be repeatedly applied when lifted operations are nested.

This is a bit of a silly illustrative example: suppose you have expressions x and y of type A? with a lifted addition operator that produces an A?. There’s also a lifted conversion from A? to B?, and similarly from B? to C?.

C? c = (C?)(B?)(x + y);

As we discussed previously in this series, the compiler realizes the lifted addition as a conditional expression. We know that the lifted conversion to B? can be “distributed” to the consequence and alternative branches of the conditional expression. That then results in a different conditional expression, but one such that the conversion to C? can be distributed to each branch of that! That is, the compiler could realize the code above as:

C? c;
A? temp1 = x;
A? temp2 = y;
c = temp1.HasValue & temp2.HasValue ? 
  new C?((C)(B)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new C?();

… by applying the optimization twice, rather than creating a temporary of type A? for the sum and a temporary of type B? for the conversion of the sum, each with its own conditional expression. The aim of the optimization is to reduce the number of temporaries and conditional expressions, and thereby make the code smaller and produce fewer basic blocks.

A lifted conversion is rather like a lifted unary operator, and in fact the compiler could do the analogous optimization for the lifted unary +, -, ~ and ! operators. Continuing our silly example, suppose we have a lifted ~ operator on A? that produces an A?. If you said:

C? c = (C?)(B?)~(x + y);

Then the ~ operation can also be “distributed” to each branch of the conditional just as the conversions can be. The insight here is the same as before: if the consequence and alternative are both of the same type then

~(condition ? consequence : alternative)

is the same as

condition ? ~consequence : ~alternative

When we furthermore know that the consequence is of the form new A?(something) then we know that ~consequence is the same as new A?(~something). When we know that the alternative is of the form new A?(), then we know that ~new A?() is going to be a no-op, and just produce new A?() again. So, to make a long story short, the C# compiler can codegen the code above as:

C? c;
A? temp1 = x;
A? temp2 = y;
c = temp1.HasValue & temp2.HasValue ? 
  new C?((C)(B)(~(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) :
  new C?();

Again, we save several temporaries and branches by performing this optimization.

Now, I’ve been saying “the compiler could” a lot because of course a compiler is not required to perform these optimizations, and in fact, the “original recipe” compiler is not very aggressive about performing these optimizations. I examined the original recipe compiler very closely when implementing nullable arithmetic in Roslyn, and discovered that it suffers from a case of “premature optimization”.


Next time on FAIC: We’ll digress for the next couple of posts. Then I’ll pick up this subject again with a discussion of the evils of “premature optimization” of nullable arithmetic, and how I’m using that loaded term in a subtly different way than Knuth did.

About these ads

9 thoughts on “Nullable micro-optimization, part five

  1. Eric,

    Since we are on the subject of optimizations: why does the compiler output the logical AND operator (&) instead of the conditional AND operator (&&)? Wouldn’t it be better to avoid the call to HasValue on the second expression?

    • If `HasValue` were a field, the time to read it would generally be less than the time required for a conditional branch (there may be some caching scenarios where that would not be true). If the Just-In-Time compiler decides to in-line the property access and is from there able to recognize that `HasValue` simply reads a field, the same scenario would apply. If for some reason the Just-In-Time compiler does not in-line the property access, then the short-circuit logic might be faster in the case where the `HasValue` call could be skipped.

  2. He mentioned that in his third post (http://ericlippert.com/2013/01/03/nullable-micro-optimization-part-three/):

    A brief aside: shouldn’t that be temp1.HasValue && temp2.HasValue? Both versions give the same result; is the short circuiting one more efficient? Not necessarily! AND-ing together two bools is extremely fast, possibly faster than doing an extra conditional branch to avoid what is going to be an extremely fast property lookup. And the code is certainly smaller. Roslyn uses non-short-circuiting AND, and I seem to recall that the earlier compilers do as well.

  3. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1271

  4. Erik, you say “we save several temporaries”. How much does it matter when processor actually executes the code? Do temporary variables in your examples mean extra (or slower) instructions?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s