I’ve been writing this blog for almost ten years now and there are plenty of readers who have quite reasonably never gone back through that archive of over 750 posts. Maybe one Friday a month or so, I’m going to rerun one of my favourite “fun” posts from the last decade. Today, a story I posted on the first anniversary of my blog, in September of 2004. Enjoy! Continue reading
Last time in this series I described how a lifted implicit conversion could be “distributed” to both branches of the conditional operator that is the realization of a lifted arithmetic expression, and then optimized further on each side. Of course, the thing being converted can be any lifted expression in order to take advantage of this optimization. This means that the optimization “composes” nicely; the optimization could be repeatedly applied when lifted operations are nested.
This is a bit of a silly illustrative example: suppose you have expressions
y of type
A? with a lifted addition operator that produces an
A?. There’s also a lifted conversion from
B?, and similarly from
C? c = (C?)(B?)(x + y);
As we discussed previously in this series, the compiler realizes the lifted addition as a conditional expression. We know that the lifted conversion to
B? can be “distributed” to the consequence and alternative branches of the conditional expression. That then results in a different conditional expression, but one such that the conversion to
C? can be distributed to each branch of that! That is, the compiler could realize the code above as:
C? c; A? temp1 = x; A? temp2 = y; c = temp1.HasValue & temp2.HasValue ? new C?((C)(B)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new C?();
… by applying the optimization twice, rather than creating a temporary of type
A? for the sum and a temporary of type
B? for the conversion of the sum, each with its own conditional expression. The aim of the optimization is to reduce the number of temporaries and conditional expressions, and thereby make the code smaller and produce fewer basic blocks.
A lifted conversion is rather like a lifted unary operator, and in fact the compiler could do the analogous optimization for the lifted unary
! operators. Continuing our silly example, suppose we have a lifted
~ operator on
A? that produces an
A?. If you said:
C? c = (C?)(B?)~(x + y);
~ operation can also be “distributed” to each branch of the conditional just as the conversions can be. The insight here is the same as before: if the consequence and alternative are both of the same type then
~(condition ? consequence : alternative)
is the same as
condition ? ~consequence : ~alternative
When we furthermore know that the consequence is of the form
new A?(something) then we know that
~consequence is the same as
new A?(~something). When we know that the alternative is of the form
new A?(), then we know that
~new A?() is going to be a no-op, and just produce
new A?() again. So, to make a long story short, the C# compiler can codegen the code above as:
C? c; A? temp1 = x; A? temp2 = y; c = temp1.HasValue & temp2.HasValue ? new C?((C)(B)(~(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) : new C?();
Again, we save several temporaries and branches by performing this optimization.
Now, I’ve been saying “the compiler could” a lot because of course a compiler is not required to perform these optimizations, and in fact, the “original recipe” compiler is not very aggressive about performing these optimizations. I examined the original recipe compiler very closely when implementing nullable arithmetic in Roslyn, and discovered that it suffers from a case of “premature optimization”.
Next time on FAIC: We’ll digress for the next couple of posts. Then I’ll pick up this subject again with a discussion of the evils of “premature optimization” of nullable arithmetic, and how I’m using that loaded term in a subtly different way than Knuth did.
My author copies of Essential C# 5.0 by Mark Michaelis, and, new for this edition, yours truly arrived at my house yesterday!
I know, e-books are where it’s at today; they are very convenient. But I am a traditionalist where books are concerned; I like the atoms just as much as the bits. This is the first time I’ve seen a non-electronic copy and I am very pleased with how it turned out. Though it is heavy! When you see a book only as Microsoft Word files for months on end you forget that it’s going to be almost a thousand pages.
As long-time readers of this blog know, I was one of the technical editors for Essential C# 4.0 and Essential C# 3.0. Mark was kind enough to ask me if I would like to take a larger role in the process of updating the text for the new edition, which I gladly agreed to. There is no easier way to get a byline in a book than to assist with an update to a well-written series that you already know inside-out!
Once again, many thanks to Mark and to Joan and everyone else at Addison-Wesley who made this process so smooth; you are all a pleasure to work with. Special thanks also to two of my former coworkers: C# specification guru Mads Torgersen, who wrote a very nice foreword for us, and Stephen Toub, who thoroughly reviewed the chapters dealing with asynchrony.
Last time on FAIC I described how the C# compiler elides the conversion from
int? when you add an
int? to an
int, and thereby manages to save unnecessary calls to
GetValueOrDefault(). Today I want to talk a bit about another kind of nullable conversion that the compiler can optimize. Consider the following, in which
w is an expression of type
double? z = w;
There is an implicit conversion from
double, and so there is a “lifted” conversion from
double?. As I’m sure you’d expect, given the previous entries in this series, this would be code-generated the same as:
double? z; int? temp = w; z = temp.HasValue ? new double?((double)temp.GetValueOrDefault()) : new double?();
If you don’t know anything more about
w then that’s about as good as it gets. But suppose we did know more. For example, suppose we have:
double? z = new int?();
That might seem crazy, but bear with me. In this case, obviously the compiler need not ever call
HasValue in the first place because you and I both know it is going to be false. And we know that there are no side effects of the expression that need to be preserved, so the compiler can simply generate:
double? z = new double?();
Similarly, suppose we have an expression
q of type
int, and the assignment:
double? z = new int?(q);
Again, clearly we do not need to go through the rigamarole of making a temporary and checking to see if its
HasValue property is true. We can skip straight to:
double? z = new double?((double)q);
So this is all well and good. The Roslyn and “original recipe” C# compilers both perform these optimizations. But now let’s think about a trickier case. Suppose we have expressions
y both of type
int?, and suppose for the sake of argument that we do not know anything more about the operands:
double? z = x + y;
Now, reason like the compiler. We do not know whether
y have values or not, so we need to use the un-optimized version of addition. So this is the same as:
double? z; int? temp1 = x; int? temp2 = y; int? sum = temp1.HasValue & temp2.HasValue ? new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new int?(); z = (double?)sum;
We don’t know whether
sum has a value or not, so we must then generate the full lifted conversion, right? So this is then generated as:
double? z; int? temp1 = x; int? temp2 = y; int? sum = temp1.HasValue & temp2.HasValue ? new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new int?(); z = sum.HasValue ? new double?((double)sum.GetValueOrDefault()) : new double?()
Is that the best we can do? No! The key insight here is that the conversion can be distributed into the consequence and alternative of the conditional, and that doing so enables more optimizations. That is to say that:
z = (double?) (temp1.HasValue & temp2.HasValue ? new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : new int?());
Gives the exact same result as:
z = temp1.HasValue & temp2.HasValue ? (double?) new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : (double?) new int?();
But we already know how to optimize those! I said above that only crazy people would convert
new int?() to
double?, and of course you would not do that in your user-written code. But when the compiler itself generates that code during an optimization, it can optimize it further. The compiler generates a lifted conversion from a lifted arithmetic expression by distributing the conversion into both branches of the conditional, and then optimizes each branch. Therefore,
double? z = x + y; is actually generated as:
double? z; int? temp1 = x; int? temp2 = y; z = temp1.HasValue & temp2.HasValue ? new double?((double)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) : new double?();
The compiler does not need to generate the
sum variable at all, and it certainly does not need to check to see if it has a value. This optimization eliminates one entire temporary and the entire second conditional expression.
Next time on FAIC: We’ll digress for some brief news on the publishing front. We’ll then continue this series and ask: are there other “chained” lifted operations that can be optimized?
Happy New Year all; I hope you had as pleasant a New Year’s Eve as I did.
Last time on FAIC I described how the C# compiler first uses overload resolution to find the unique best lifted operator, and then uses a small optimization to safely replace a call to
Value with a call to
GetValueOrDefault(). The jitter can then generate code that is both smaller and faster. But that’s not the only optimization the compiler can perform, not by far. To illustrate, let’s take a look at the code you might generate for a binary operator, say, the addition of two expressions of type
int? z = x + y;
Last time we only talked about unary operators, but binary operators are a straightforward extension. We have to make two temporaries, so as to ensure that side effects are executed exactly once:
int? z; int? temp1 = x; int? temp2 = y; z = temp1.HasValue & temp2.HasValue ? new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new int?();
A brief aside: shouldn’t that be
temp1.HasValue && temp2.HasValue?
Both versions give the same result; is the short circuiting one more efficient? Not necessarily! AND-ing together two bools is extremely fast, possibly faster than doing an extra conditional branch to avoid what is going to be an extremely fast property lookup. And the code is certainly smaller. Roslyn uses non-short-circuiting AND, and I seem to recall that the earlier compilers do as well.
Anyway, when you do a lifted addition of two nullable integers, that’s the code that the compiler generates when it knows nothing about either operand. Suppose however that you added an expression
q of type
int? and an expression
r of type
int? s = q + r;
OK, reason like the compiler here. First off, the compiler has to determine what the addition operator means, so it uses overload resolution and discovers that the unique best applicable operator is the lifted integer addition operator. Therefore both operands have to be converted to the operand type expected by the lifted operator,
int?. So immediately we have determined that this means:
int? s = q + (int?)r;
Which of course is equivalent to
int? s = q + new int?(r);
And now we have an addition of two nullable integers. We already know how to do that, so the compiler generates:
int? s; int? temp1 = q; int? temp2 = new int?(r); s = temp1.HasValue & temp2.HasValue ? new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new int?();
And of course you are saying to yourself well that’s stupid. You and I both know that
temp2.HasValue is always going to be true, and that
temp2.GetValueOrDefault() is always going to be whatever value
r had when the temporary was built. The compiler can optimize this to:
int? s; int? temp1 = q; int temp2 = r; s = temp1.HasValue ? new int?(temp1.GetValueOrDefault() + temp2) : new int?();
Just because the conversion from
int? is required by the language specification does not mean that the compiler actually has to generate code that does it; rather, all the compiler has to do is generate code that produces the correct results!
A fun fact is that the Roslyn compiler’s nullable arithmetic optimizer actually optimizes it to
temp1.HasValue & true ? ..., and then Roslyn’s regular Boolean arithmetic optimizer gets rid of the unnecessary operator. It was easier to write the code that way than to be super clever in the nullable optimizer.
Roslyn will also optimize lifted binary operator expressions where both sides are known to be null, where one side is known to be null, and where both sides are known to be non-null. Since these scenarios are rare in user-written code, I’m not going to discuss them in this series.
Next time on FAIC: What happens when we throw some lifted conversions into the mix?