Previously in this series I said that the fact that the original C# compiler pursues a less aggressive strategy for optimizing away temporaries and branches from nested lifted conversions and unary operators because it suffers from “premature optimization”. That’s a loaded term and I’m not using it in the standard sense, so I want to clarify that a bit.
Donald Knuth, author of the classic four-volume series The Art of Computer Programming, famously said “premature optimization is the root of all evil.“. I think however that it is more instructive to read that quotation with more context:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.
Which is of course what I have echoed in my numerous performance rants over the years: don’t waste your valuable time making risky changes to the 97% of the code that isn’t the slowest thing and that no customer will ever notice. Use a profiler, find the slowest 3%, and spend your optimization budget on that.
That is good advice, but when I say that a compiler suffers from “premature optimization”, that’s not at all what I mean. Rather, I mean that the compiler performs an optimization pass too early in the compilation process.
Back in 2010 I described in considerable detail the various stages, or “passes”, that the original recipe C# compiler performs when going from raw text to IL, so you might want to read that. For the purposes of this discussion we can simplify that all down to four stages:
- Lexical and grammatical analysis
- Initial “binding” — that is, semantic analysis
- “Lowering” — that is, rewriting high-level code into low-level code — and additional error detection
- Code generation
You would expect that semantic optimizations[1. I described many of the optimizations that the C# compiler performs back in 2009. ] such as lifted arithmetic lowering would happen in the third stage, [2. Some optimizations of course happen during the fourth phase, because the code generator itself can identify branches and temporaries that can be eliminated.] and for the most part they do.[3. For some examples of how premature optimizations in the initial binding pass led to bugs and breaking changes, see my posts from 2006 on that subject. Part one. Part two.] The implementation decision which is vexing me today is that the original recipe C# compiler’s strategy is that the initial binding pass identifies portions of lifted arithmetic expressions that can be optimized later, and flags them as needing attention during the lowering pass.[4. I am over-simplifying here; it is not as simple as a Boolean flag in most cases. In fact, the amount of information that is stored by the initial binding pass for the use of the optimizer later is quite scary because it is easy to accidentally use the wrong bits when lowering. An example of such a bug is in this StackOverflow question.]
The problem is that the initial binding pass only identifies opportunities for optimization based on the original form of the code. If the optimization pass produces “lowered” code that is itself amenable to further optimization then it is never optimized because there’s no flag left in there by the initial binding pass!
To make a long story short — and yes, this seems to have gotten rather long, sorry — the practical upshot is that the original recipe compiler is very good at finding “shallow” optimization opportunities on lifted operations, but very bad at making optimizations compose nicely when lifted operations are deeply nested; those tend to generate lots of unnecessary temporaries and branches.
Like I said previously, the compiler is not required to make those optimizations, but it has always vexed me that it does not. Roslyn improves on this situation by deferring all lowering and optimization of lifted arithmetic to the lowering phase; only the bare minimum analysis is performed during the initial binding pass. Roslyn optimizes each lifted arithmetic expression as it lowers it to temporaries and conditionals, and then tries aggressively to “distribute” lifted unary operations and conversions into those generated conditionals in order to skip creation of unnecessary temporaries and branches.
Next time on FAIC: Is it ever possible to apply this optimization technique to a lifted binary operator? I’ll pose that question in the form of a puzzle.
Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.