I am back from a week in which I visited England, Holland, Belgium, France and Hungary; this is by far the most countries I’ve been to in so little time. It was great to meet so many customers; more on bicycling around Budapest in a future episode. For today though, I’ve posted on the Coverity Development Testing Blog’s continuing series Ask The Bug Guys a follow up to the previous episode. I’ll discuss why the struct-based list enumerator does not do bounds checking in its implementation of the `Current`

property. Check it out! Thanks to reader Xi Chen for the question. Continue reading

# Category Archives: Performance

# Benchmarking mistakes, part four

# Benchmarking mistakes, part three

# String concatenation behind the scenes, part two

It is, I hope, well-known that naïve string concatenation in a loop is a quadratic “Schlemiel” algorithm. (If you haven’t read Joel’s essay, stop reading this and go read it now.)

# Benchmarking mistakes, part two

# Benchmarking mistakes, part one

# Nullable micro-optimizations, part eight

Today, the answer to the puzzle I posed last week. The answer is no, that optimization is not legal in general, though it is legal in a few specific cases.

As I have discussed several times before, C# has very strict rules about what order operations happen in. A quick rule of thumb is: **operands run left-to-right**. In our example we had `X() * Y() + Z()`

, so let’s see what conclusions we reach by applying this rule:

`X()`

is an operand of the`*`

to the left of`Y()`

, so the call to`X()`

must happen before the call to`Y()`

.`X()*Y()`

is an operand of the`+`

to the left of`Z()`

, so the multiplication must happen before the call to`Z()`

.

The compiler must generate code that computes `X()`

, then `Y()`

, then the multiplication, then `Z()`

, and then the addition. But our proposed optimization was

int? r; int? tempX = X(); int? tempY = Y(); int tempZ = Z(); r = tempX.HasValue & tempY.HasValue ? new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) : new int?();

which computes `Z()`

*before* the multiplication.

So what’s the big deal? **Multiplication of integers can throw an exception in a checked context**. That exception should prevent `Z()`

from being called in the first place should `X() * Y()`

throw on the multiplication. **This optimization is only valid in an unchecked context.**

And of course it just gets worse if we start to consider lifted arithmetic on types other than `int?`

. It’s a bad practice to write a user-defined operator with a side effect, but it is legal and the C# compiler must ensure that side effects of user-defined operators are observed to happen in the prescribed order.

Rather than try to figure out all the types for which this optimization is legal in various contexts, Roslyn makes no attempt to optimize this kind of binary operator. It generates a temporary `int?`

for the multiplication, and then does the regular lifted arithmetic for the addition. Another lovely optimization spoiled by conflict with an lovely invariant.

But wait!

The problem is that the side effects of the multiplication must be observed to come before the side effects of the right addend. What if the right addend has no side effects? Then we can do this optimization!

The most common situation in which the right addend has no side effects is when it is a constant:

int? r = X() * Y() + 1;

This *can* legally be optimized to

int? r; int? tempX = X(); int? tempY = Y(); r = tempX.HasValue & tempY.HasValue ? new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + 1) : new int?();

And in fact I did add this optimization to Roslyn; just as unary operations and conversions can be “distributed”, so can binary operations where the right operand is a constant. Moreover, as a pleasant side effect, doing so allowed for an easy implementation of various optimizations that improve the quality of lifted `x += 1`

and `x++`

expressions.

Well, that took rather more episodes than I thought it would when I started! I could talk a bit more about how Roslyn optimizes other more exotic nullable arithmetic, like equality, inequality, and logical operations. I could also talk a bit about the stuff I didn’t get to implement before I left; Roslyn does a slightly worse job than the original recipe compiler when optimizing expressions where we know that the result is going to be null, but also must compute a side effect. But rather than go down those lengthy rabbit holes, I think I’ll call this series done at eight episodes.

**Next time on FAIC:** Some of these optimizations we’ve been talking about are *elisions*; I’ll talk a bit about how computer programmers have borrowed this term from linguistics, and how we use it in two different senses.

# Nullable micro-optimizations, part seven

Today, a puzzle for you.

We’ve been talking about how the Roslyn C# compiler aggressively optimizes nested lifted unary operators and conversions by using a clever technique. The compiler realizes the inner operation as a conditional expression with a non-null nullable value on the consequence branch and a null nullable value on the alternative branch, distributes the outer operation to each branch, and then optimizes the branches independently. That then gives a conditional expression that can itself be the target of further optimizations if the nesting is deeper.

This works great for lifted conversions and unary operators. Does it also work for binary operators? It seems like it would be a lot harder to make this optimization work for a lifted binary operator where *both *operands are themselves lifted operations. But what if just *one* of the operands was a lifted operation, and the other operand was guaranteed to be non-null? There might be an opportunity to optimize such an expression. Let’s try it. Suppose `X()`

and `Y()`

are expressions of type `int?`

and that `Z()`

is an expression of type `int`

:

int? r = X() * Y() + Z();

We know from our previous episodes that operator overload resolution is going to choose lifted multiplication for the inner subexpression, and lifted addition for the outer subexpression. We know that the right operand of the lifted addition will be treated as though it was `new int?(Z())`

, but we can optimize away the unnecessary conversion to `int?`

. So the question is **can the C# compiler legally code-generate that as though the user had written:**

int? r; int? tempX = X(); int? tempY = Y(); int tempZ = Z(); r = tempX.HasValue & tempY.HasValue ? new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) : new int?();

If you think the answer is “yes” then the follow-up question is: **can the C# compiler legally make such an optimization for all nullable value types that have lifted addition and multiplication operators?**

If you think the answer is “no” then the follow-up questions are: **why not?** and **is there any scenario where this sort of optimization is valid?**

**Next time on FAIC** we’ll be kind to our fine feathered friends; after that, we’ll find out the answer to today’s question.

Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.

# Nullable micro-optimizations, part six

Previously in this series I said that the fact that the original C# compiler pursues a less aggressive strategy for optimizing away temporaries and branches from nested lifted conversions and unary operators because it suffers from “premature optimization”. That’s a loaded term and I’m not using it in the standard sense, so I want to clarify that a bit.

Donald Knuth, author of the classic four-volume series The Art of Computer Programming, famously said “*premature optimization is the root of all evil.*“. I think however that it is more instructive to read that quotation with more context:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

Which is of course what I have echoed in my numerous performance rants over the years: don’t waste your valuable time making risky changes to the 97% of the code that isn’t the slowest thing and that no customer will ever notice. Use a profiler, find the slowest 3%, and spend your optimization budget on that.

That is good advice, but when I say that a compiler suffers from “premature optimization”, that’s not at all what I mean. Rather, I mean that *the compiler performs an optimization pass too early in the compilation process*.

Back in 2010 I described in considerable detail the various stages, or “passes”, that the original recipe C# compiler performs when going from raw text to IL, so you might want to read that. For the purposes of this discussion we can simplify that all down to four stages:

- Lexical and grammatical analysis
- Initial “binding” — that is, semantic analysis
- “Lowering” — that is, rewriting high-level code into low-level code — and additional error detection
- Code generation

You would expect that semantic optimizations like the ones I described back in 2009 such as lifted arithmetic lowering would happen in the third stage. Some optimizations of course happen during the fourth stage, because the code generator itself can identify branches and temporaries that can be eliminated. But most happen in the third stage.

Doing an optimization in the wrong stage can introduce correctness problems; for some examples of how premature optimizations in the initial binding pass led to bugs and breaking changes, see my posts from 2006 on that subject. Part one. Part two.

Today I’m not concerned about correctness; I’m concerned about how complete the optimization is. The implementation decision which is vexing me today is that the original recipe C# compiler’s strategy is that the initial binding pass identifies portions of lifted arithmetic expressions that can be optimized later, and flags them as needing attention during the lowering pass, which is where the optimizations are done.

I am over-simplifying here; it is not as simple as a Boolean flag in most cases. In fact, the amount of information that is stored by the initial binding pass for the use of the optimizer later is quite scary because it is easy to accidentally use the wrong bits when lowering. An example of such a bug is in this StackOverflow question. But we can think of it logically as a flag.

The problem is that the initial binding pass only identifies opportunities for optimization based on the *original* form of the code. If the optimization pass produces “lowered” code that *is itself amenable to further optimization* then it is never optimized because there’s no flag left in there by the initial binding pass! Deciding whether something could benefit from optimization was being done too soon.

To make a long story short — and yes, this seems to have gotten rather long, sorry — the practical upshot is that the original recipe compiler is very good at finding “shallow” optimization opportunities on lifted operations, but very bad at making optimizations compose nicely when lifted operations are deeply nested; those tend to generate lots of unnecessary temporaries and branches.

Like I said previously, the compiler is not *required* to make those optimizations, but it has always vexed me that it does not. Roslyn improves on this situation by deferring all lowering and optimization of lifted arithmetic to the lowering phase; only the bare minimum analysis is performed during the initial binding pass. Roslyn optimizes each lifted arithmetic expression as it lowers it to temporaries and conditionals, and then tries aggressively to “distribute” lifted unary operations and conversions into those generated conditionals in order to skip creation of unnecessary temporaries and branches.

**Next time on FAIC:** Is it *ever* possible to apply this optimization technique to a lifted *binary* operator? I’ll pose that question in the form of a puzzle.

Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.

# Nullable micro-optimization, part five

Last time in this series I described how a lifted implicit conversion could be “distributed” to both branches of the conditional operator that is the realization of a lifted arithmetic expression, and then optimized further on each side. Of course, the thing being converted can be *any* lifted expression in order to take advantage of this optimization. This means that the optimization “composes” nicely; the optimization could be repeatedly applied when lifted operations are nested.

This is a bit of a silly illustrative example: suppose you have expressions `x`

and `y`

of type `A?`

with a lifted addition operator that produces an `A?`

. There’s also a lifted conversion from `A?`

to `B?`

, and similarly from `B?`

to `C?`

.

C? c = (C?)(B?)(x + y);

As we discussed previously in this series, the compiler realizes the lifted addition as a conditional expression. We know that the lifted conversion to `B?`

can be “distributed” to the consequence and alternative branches of the conditional expression. That then results in a *different *conditional expression, but one such that the conversion to `C?`

can be distributed to each branch of that! That is, the compiler could realize the code above as:

C? c; A? temp1 = x; A? temp2 = y; c = temp1.HasValue & temp2.HasValue ? new C?((C)(B)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : new C?();

… by applying the optimization twice, rather than creating a temporary of type `A?`

for the sum and a temporary of type `B?`

for the conversion of the sum, each with its own conditional expression. The aim of the optimization is to reduce the number of temporaries and conditional expressions, and thereby make the code smaller and produce fewer basic blocks.

A lifted conversion is rather like a lifted unary operator, and in fact the compiler could do the analogous optimization for the lifted unary `+`

, `-`

, `~`

and `!`

operators. Continuing our silly example, suppose we have a lifted `~`

operator on `A?`

that produces an `A?`

. If you said:

C? c = (C?)(B?)~(x + y);

Then the `~`

operation can also be “distributed” to each branch of the conditional just as the conversions can be. The insight here is the same as before: if the consequence and alternative are both of the same type then

~(condition ? consequence : alternative)

is the same as

condition ? ~consequence : ~alternative

When we furthermore know that the consequence is of the form `new A?(something)`

then we know that `~consequence`

is the same as `new A?(~something)`

. When we know that the alternative is of the form `new A?()`

, then we know that `~new A?()`

is going to be a no-op, and just produce `new A?()`

again. So, to make a long story short, the C# compiler can codegen the code above as:

C? c; A? temp1 = x; A? temp2 = y; c = temp1.HasValue & temp2.HasValue ? new C?((C)(B)(~(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) : new C?();

Again, we save several temporaries and branches by performing this optimization.

Now, I’ve been saying “the compiler *could*” a lot because of course a compiler is not *required *to perform these optimizations, and in fact, the “original recipe” compiler is not very aggressive about performing these optimizations. I examined the original recipe compiler very closely when implementing nullable arithmetic in Roslyn, and discovered that it suffers from a case of “premature optimization”.

**Next time on FAIC:** We’ll digress for the next couple of posts. Then I’ll pick up this subject again with a discussion of the evils of “premature optimization” of nullable arithmetic, and how I’m using that loaded term in a subtly different way than Knuth did.