Nullable micro-optimizations, part six

Previously in this series I said that the fact that the original C# compiler pursues a less aggressive strategy for optimizing away temporaries and branches from nested lifted conversions and unary operators because it suffers from “premature optimization”. That’s a loaded term and I’m not using it in the standard sense, so I want to clarify that a bit.

Donald Knuth, author of the classic four-volume series The Art of Computer Programming, famously said “premature optimization is the root of all evil.“. I think however that it is more instructive to read that quotation with more context:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

Which is of course what I have echoed in my numerous performance rants over the years: don’t waste your valuable time making risky changes to the 97% of the code that isn’t the slowest thing and that no customer will ever notice. Use a profiler, find the slowest 3%, and spend your optimization budget on that.

That is good advice, but when I say that a compiler suffers from “premature optimization”, that’s not at all what I mean. Rather, I mean that the compiler performs an optimization pass too early in the compilation process.

Back in 2010 I described in considerable detail the various stages, or “passes”, that the original recipe C# compiler performs when going from raw text to IL, so you might want to read that. For the purposes of this discussion we can simplify that all down to four stages:

Lexical and grammatical analysis
Initial “binding” — that is, semantic analysis
“Lowering” — that is, rewriting high-level code into low-level code — and additional error detection
Code generation

You would expect that semantic optimizations like the ones I described back in 2009 such as lifted arithmetic lowering would happen in the third stage. Some optimizations of course happen during the fourth stage, because the code generator itself can identify branches and temporaries that can be eliminated. But most happen in the third stage.

Doing an optimization in the wrong stage can introduce correctness problems; for some examples of how premature optimizations in the initial binding pass led to bugs and breaking changes, see my posts from 2006 on that subject. Part one. Part two.

Today I’m not concerned about correctness; I’m concerned about how complete the optimization is. The implementation decision which is vexing me today is that the original recipe C# compiler’s strategy is that the initial binding pass identifies portions of lifted arithmetic expressions that can be optimized later, and flags them as needing attention during the lowering pass, which is where the optimizations are done.

I am over-simplifying here; it is not as simple as a Boolean flag in most cases. In fact, the amount of information that is stored by the initial binding pass for the use of the optimizer later is quite scary because it is easy to accidentally use the wrong bits when lowering. An example of such a bug is in this StackOverflow question. But we can think of it logically as a flag.

The problem is that the initial binding pass only identifies opportunities for optimization based on the original form of the code. If the optimization pass produces “lowered” code that is itself amenable to further optimization then it is never optimized because there’s no flag left in there by the initial binding pass! Deciding whether something could benefit from optimization was being done too soon.

To make a long story short — and yes, this seems to have gotten rather long, sorry — the practical upshot is that the original recipe compiler is very good at finding “shallow” optimization opportunities on lifted operations, but very bad at making optimizations compose nicely when lifted operations are deeply nested; those tend to generate lots of unnecessary temporaries and branches.

Like I said previously, the compiler is not required to make those optimizations, but it has always vexed me that it does not. Roslyn improves on this situation by deferring all lowering and optimization of lifted arithmetic to the lowering phase; only the bare minimum analysis is performed during the initial binding pass. Roslyn optimizes each lifted arithmetic expression as it lowers it to temporaries and conditionals, and then tries aggressively to “distribute” lifted unary operations and conversions into those generated conditionals in order to skip creation of unnecessary temporaries and branches.

Next time on FAIC: Is it ever possible to apply this optimization technique to a lifted binary operator? I’ll pose that question in the form of a puzzle.

Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.

7 thoughts on “Nullable micro-optimizations, part six”

Roman on January 16, 2013 at 5:04 am said:

I still find it sad that manual elimination of common subexpressions with provably no side-effects can still lead to significant speed-ups in very tight loops… Microsoft’s C# compiler originally left it all up to the JITter, but then it turned out that the JITter can’t actually spend that much time on detailed code analysis (surprise!) and so the resulting code often has major high-level inefficiencies that other compilers had been optimizing for more than a decade.

I’m glad that this situation is being remedied, albeit slowly. Microsoft’s answer to “it’s too slow” has always been “then use C++ for that particular code”, but we all know how much PITA that is in practice.

Reply ↓
- John Payson on January 16, 2013 at 9:30 am said:
  
  There are some optimizations a compiler can’t perform but a JIT could (if so inclined). For example, if code in Foo says “someArray[5] = new SomeStruct(a,b);`, and SomeStruct is in an outside assembly, the compiler would have no way of knowing whether passing the array element directly to the constructor would yield observably-different behavior from invoking the constructor on a temporary instance and then copying that instance to the array. If the constructor simply does “field1=param1; field2=param2;”, passing the array element rather than a temporary instance would be a useful and safe optimization, but even if a compiler examined the code for “SomeStruct” and determined that the optimization would be safe for that particular version of “SomeStruct” it was examining, it would have no way of knowing whether the code it’s compiling might be run with a different version of “SomeStruct” for which the optimization would not be safe.
  
  I wonder to what extent present or future version of the CLI spec allow let a compiler to share with the JIT an understanding of what optimizations would likely be useful if they’re safe? The C# compiler can likely afford more time for analysis than the JIT, but the JIT can make sure that an outside method has no side-effects, while the C# compiler can’t. I’d be interested in any insights Mr. Lippert or anyone associated with compiler/CLI development would like to share.
  
  Reply ↓
web design ny on February 1, 2014 at 1:22 am said:

Even as we introduced our team, never to waste you time, we will
undergo our products and accompanying solutions. The primary item of our site developer
business is naturally web site layout.

Reply ↓
Witty Quotes on February 1, 2014 at 1:30 am said:

Hello there! This blog post couldn’t be written any better!

Looking at this post reminds me of my previous roommate!
He constantly kept preaching about this. I am going to send this post to him.
Pretty sure he will have a good read. Many thanks for sharing!

Reply ↓
design a business card on February 3, 2014 at 4:33 am said:

hey there and thank you for your information – I’ve definitely
picked up anything new from right here. I did however expertise some technical
points using this website, as I experienced to reload the web site many times previous to I could get it to load properly.

I had been wondering if your web host is OK? Not that I’m complaining, but slow loading instances times will very
frequently affect your placement in google and could damage your high quality score if ads and marketing with Adwords.
Anyway I’m adding this RSS to my e-mail and can look out
for much more of your respective fascinating content.
Make sure you update this again soon.

Reply ↓
www.bvfheating.pt on February 13, 2014 at 7:02 pm said:

qw
Do you have a spam problem on this site; I also am a blogger,
and I was wanting to know your situation; many of us have created
some nice methods and we are looking to trade strategies with other folks, be
sure to shoot me an e-mail if interested.

Reply ↓
nature in kyrgyzstan on March 26, 2014 at 2:15 pm said:

When I originally commented I clicked the “Notify me when new comments are added” checkbox
and now each time a comment is added I get three emails
with the same comment. Is there any way you can remove me from that service?
Thanks!

Reply ↓