Nullable micro-optimizations, part seven

Today, a puzzle for you.

We’ve been talking about how the Roslyn C# compiler aggressively optimizes nested lifted unary operators and conversions by using a clever technique. The compiler realizes the inner operation as a conditional expression with a non-null nullable value on the consequence branch and a null nullable value on the alternative branch, distributes the outer operation to each branch, and then optimizes the branches independently. That then gives a conditional expression that can itself be the target of further optimizations if the nesting is deeper.

This works great for lifted conversions and unary operators. Does it also work for binary operators? It seems like it would be a lot harder to make this optimization work for a lifted binary operator where both operands are themselves lifted operations. But what if just one of the operands was a lifted operation, and the other operand was guaranteed to be non-null? There might be an opportunity to optimize such an expression. Let’s try it. Suppose X() and Y() are expressions of type int? and that Z() is an expression of type int:

int? r = X() * Y() + Z();

We know from our previous episodes that operator overload resolution is going to choose lifted multiplication for the inner subexpression, and lifted addition for the outer subexpression. We know that the right operand of the lifted addition will be treated as though it was new int?(Z()), but we can optimize away the unnecessary conversion to int?. So the question is can the C# compiler legally code-generate that as though the user had written:

int? r;
int? tempX = X();
int? tempY = Y();
int tempZ = Z();
r = tempX.HasValue & tempY.HasValue ?
  new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + tempZ) :
  new int?();

If you think the answer is “yes” then the follow-up question is: can the C# compiler legally make such an optimization for all nullable value types that have lifted addition and multiplication operators?

If you think the answer is “no” then the follow-up questions are: why not? and is there any scenario where this sort of optimization is valid?

Next time on FAIC we’ll be kind to our fine feathered friends; after that, we’ll find out the answer to today’s question.

Eric is crazy busy at Coverity’s head office; this posting was pre-recorded.

14 thoughts on “Nullable micro-optimizations, part seven”

The optimization is legal only if the multiplication operator has no side effects and doesn’t throw exceptions.

So in the case of integers, it’s only valid in an unchecked context. In a checked context, the multiplication might throw an OverflowException; so the compiler mustn’t generate code that calls Z() before the exception is thrown.

Reply ↓

Eric Lippert on January 17, 2013 at 9:07 am said:

A correct answer right out of the gate! Nicely done.

Reply ↓
Daniel Grunwald on January 17, 2013 at 9:12 am said:

That should have been “is legal if”, not “only if”. There are other scenarios where this optimization can be valid; e.g. if the compiler can show that Z() has no side effects; doesn’t throw exceptions and doesn’t depend on state changed by the multiplication operator. (easiest case: Z() is a compile-time constant)

Reply ↓
- Eric Lippert on January 17, 2013 at 10:04 am said:
  
  You’ve put your finger on it; the “constant on the right hand side” case is the only one that Roslyn optimizes. I’ll discuss that next week.
  
  Reply ↓

You are nitpicking the details of the optimization implementation. The core implementation idea is still valid. If you don’t use a local:
new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + Z())
it looks to me that this is a valid transform, the optimization we wanted to apply is there, and I use one less temporary than you do.

You can also emit a if() rather than using the ternary ?: for more flexibility at which point you evaluate your locals.

Of course, this goes to show that _any_ transform, as simple as it looks, may be wrong for very subtle reasons. I sometimes wonder how C++ compilers can perform any optimisations at all.

Regarding the optimization of binary operators with nullable on both sides, I guess it works as well:
X() * Y() + Z()
everything is int?, can be translated to:
int? x = X();
int? y = Y();
int? result;
if (x.HasValue && y.HasValue)
{
int left = x*y;
int? z = Z();
result = z.HasValue ? new int?(left+z) : null;
}
else
{
Z(); // for side-effects (I don’t think lifted + should short-circuit)
result = null;
}

Reply ↓

Conrad on January 17, 2013 at 10:34 am said:

*You are nitpicking the details of the optimization implementation.*

When you since this a series of articles called “Nullable micro-optimizations” I would expect that.

Also this causes the compiler error
int left = x*y;

“Cannot implicitly convert type ‘int?’ to ‘int’. An explicit conversion exists (are you missing a cast?)”

You could fix that with int left = x.Value * y.Value but as we learned that in part 1 of this series GetValueOrDefault() is faster and its also legal since you’ve already checked that x and y have values with if (x.HasValue && y.HasValue)

Nullable micro-optimizations, part one

Reply ↓
- jods on January 18, 2013 at 10:40 am said:
  
  Good catch, although I noticed my typo and posted about it to prevent such a comment one hour before you did… just look below.
  
  You could have pointed out that z is missing its .GetValueOrDefault call as well, not that it really matter…
  
  Reply ↓
Sebastian Redl on February 21, 2013 at 8:56 am said:

> I sometimes wonder how C++ compilers can perform any optimisations at all.

There really is no difference between C# and C++ compilers here. As long as your program doesn’t do weird stuff, the compiler has exactly the same knowledge in the two languages: function calls might do pretty much anything to the global state and their by-ref arguments. Everything else is pretty much known.
The only difference is that C++ has larger memory-safety holes and you can do more weird stuff. But that generally leads to undefined behavior, which means the compiler can do whatever it wants *anyway*, and generally will just assume this doesn’t happen.

Reply ↓

Instead of calculating Z() and assigning it to a temp, it should be possible to move the calculation of Z() into each branch (you’d just ignore the return value before returning null in the alternate branch).

Reply ↓

Eric Lippert on January 17, 2013 at 11:03 am said:

Sure. But that is then duplicating the code. What if it was more complex than just “Z()”? The point of the optimization is to make the code smaller and simpler; duplicating code usually works against that. As I’ll discuss next week, Roslyn uses a very simple heuristic: the expression is only optimized if the right hand side is a constant. We then know that it doesn’t need to be replicated on the alternative branch!

Reply ↓
- jods on January 18, 2013 at 10:54 am said:
  
  I was thinking about generalizing the construction I showed above to more operators, which turned out to be quite easy (at least if you use goto|s instead of if|s), when that very argument came to my mind.
  
  As often it is a case of memory vs cpu trade-off. My solution above would indeed duplicate each operand expression once, except for the first two. On the other hand, I test only one condition per operand and I create a single int? for the final result. So what do you optimize for? Given that memory is cheap and those expressions are unlikely to be really big anyway, I’d say go for the cpu.
  
  Maybe if you really want to avoid degenerate cases use a heuristic that disable the optimization based on expression size?
  
  BTW, here’s how I see the optimization for more than 2 operators (e.g. x * y + z / w):
  
  int? op_1 = X();
  int? op_2 = Y();
  if (!op_1.HasValue || !op_2.HasValue) goto sideEffects_3;
  op_1 = op_1.GetValueOrDefault() + op_2.GetValueOrDefault();
  op_2 = Z();
  if (!op_2.HasValue) goto sideEffects_4;
  op_1 = op_1.GetValueOrDefault() * op_2.GetValueOrDefault();
  op_2 = W();
  if (!op_2.HasValue) goto sideEffects_5;
  op_1 = op_1.GetValueOrDefault() / op_2.GetValueOrDefault();
  return new int?(op_1);
  sideEffect3: Z();
  sideEffect4: W();
  sideEffect5: return null;
  
  I took the liberty to use return instead of result assignment, that doesn’t change the flow. Also I’m reusing the same 2 locals again and again, obviously if there are some type conversions and everything is not int? you would need to use some more. Since their lifetime doesn’t overlap it’s likely that the share the same stack space after codegen anyway.
  
  Reply ↓

Just noticed that I forgot to type the GetValueOrDefault calls but they should be obvious!

Reply ↓

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1276

Pingback: Nullable micro-optimizations, part six | Fabulous adventures in coding

Fabulous adventures in coding

Eric Lippert's blog

Nullable micro-optimizations, part seven

14 thoughts on “Nullable micro-optimizations, part seven”

Leave a comment Cancel reply

Share this:

Related

14 thoughts on “Nullable micro-optimizations, part seven”

Leave a comment Cancel reply