Optimizing associative operations

A question I occasionally get is, suppose I have code like this:

const double x = 200.0;
const double y = 0.5;
...
void M(double z)
{
  double r = z * x * y;
  ...

Does the compiler generate code as though you’d written z * 100.0?

Well, if you try compiling this and examine the output of the compiler with ILDASM you’ll see that no, in fact there are two multiplications generated. But if you’d written x * y * z then in fact there would be just a multiplication by 100.0 in the IL. What’s going on?

The issue here is that multiplication is left associative. That is, a * b * c must be evaluated as (a * b) * c, and not a * (b * c). Note that this has nothing to do with the ordering in time of the evaluations of the operands; C# guarantees that the operands are evaluated left to right regardless of how the expression is parenthesized. Rather, the compiler must guarantee that if there is possibly a difference in the values of (a * b) * c and a * (b * c), the former must be chosen.

Could there be such a difference? Sure. Suppose z is so large that multiplying it by 100.0 is finite, but multiplying it by 200.0 results in an infinity. Infinity times 0.5 is still infinity, so the correct answer here would be infinity.

Now we have to throw in a little wrinkle here, which is that the C# specification notes that the runtime is permitted to do floating point operations using extra precision bits if it so chooses, and in fact many implementations of the CLR do. It is entirely possible that even after doing the multiplications by 200.0 and 0.5 on such a value that the result is the same as though it were multiplied by 100.0. The question at hand is not whether you can get that result — you can — but whether or not the compiler is allowed to optimize z * 200.0 * 0.5 by generating code for z * 100.0. Since there could be a difference, the compiler cannot make that optimization.

And similarly if we had something like z * y * x, then z could be so small that multiplying by 0.5 underflows to zero, and then the multiplication by 200.0 results in 0.

Once again we see that floating point arithmetic is weird. What about integer arithmetic? If we are entirely in integers then it is the case that (a * b) * c is equal to a * (b * c), so could this optimization be made if b and c are constant integers? Give it some thought before you read on below….

.

.

.

.

.

… There is one bizarre case. Suppose we have q * 2 * 0 and we are in a checked context. The compiler must generate the multiplication by 2 because that multiplication could overflow before the multiplication by zero, so this cannot be reduced to simply q * 0. Other than that, generally speaking integer multiplication is an associative operation in C#, so the compiler could turn q * 2 * 3 into q * 6. To my knowledge no C# compiler does so. The costs of the work needed to design, implement, debug and test the optimization is not justified by the tiny benefit that accrues to almost no code.

There is however one situation in which the C# compiler does realize that an operation can be optimized by harmlessly parenthesizing on the right instead of the left: string concatenation. Code like:

r = s + 
  "blah" +
  "foo";

is common, particularly in machine-generated code. The C# compiler does realize that this can be optimized to s + "blahfoo". I wrote a bit about that a few years ago, so check that out if you’re interested.

Advertisements

20 thoughts on “Optimizing associative operations

  1. Is there a limit on what context the compiler can use? Is it allowed to optimize the expression in this case?
    if (ActuallyIsSafeToMultiplyBy200(x)) x * 200 * 0.5;

    • Hey, provided that the action of the resulting program is equivalent to that described by the specification, the compiler is permitted to make any optimizations it wants. But there are two problems. First, the kind of analysis that it takes to make the sort of optimization you describe is very complicated. Basically a compiler must make a “method summary” that determines what *logically* a method does, *under all possible control flows of that method*. That is potentially extraordinarily expensive. The very small win does not justify the added time spent compiling, or the complex logic that the compiler team would have to implement and test.

      Second, the compiler must whenever possible generate code which passes the CLR verifier. There are some optimizations which, even if the compiler could detect that the optimization was valid, it could not generate IL for without violating a rule of the verifier. That probably would not apply to your specific case, but the C# compiler has had many bugs over the years where attempts to optimize conditional expressions caused code generation that did not pass verification.

  2. Does a checked context guarantee that all overflows will produce exceptions, or does it only guarantee that any code which would produce arithmetically-incorrect results as a consequence of overflow must throw an exception instead? It would seem like the cost of overflow trapping could be much lower in the latter case, especially on 64-bit machines.

    For example, on a 64-bit machine, if all variables are of type “int”, a statement like “a=b+c+d+e+f;” would require four separate overflow checks if implemented using 32-bit math, but computing a 64-bit sum and then checking whether that was within range could be much faster.

    I’m not quite sure exactly what optimizations the JIT compiler could safely make, given that it would be necessary to ensure that any exception which is going to get thrown, must get thrown in a timely fashion; if code like:

    int total=0;
    for (int i=0; i<10000; i++) total+=foo[i];

    were invoked with an array that only had 5000 elements, but the total would overflow before the 5000th element was reached, deferring the overflow exception shouldn't cause an array-subscript exception to get thrown, but if the JIT could see that the code had no possible side-effect other than the array access it might usefully rewrite the code to compute a 64-bit total of up to 10000 elements or the size of the array, whichever is smaller, throw an overflow exception if the result was out of range, throw an array-subscript exception if the array had less than 10000 elements, or otherwise return the total.

  3. I understand how the math could result in an over/underflow or infinity but I don’t get why the compiler should care in the case of primitive data types. I can’t imagine a scenario where someone would prefer Infinity or an overflow exception over just being fully-associative for primitive data types. It seems like a poor design decision: “Hey! There’s this edge-case we need to worry about!” should have been met with “No one’s going to actually want that behavior. Let’s ignore that edge case until someone complains loud enough.”

    When we’re talking about operator overloading, fine, be left-associative only (or provide a mechanism to implement fully-associative operator overloading — kinda like the classic C++ Matrix class compile-time optimizations).

    • Your forgetting about consistency. It would be really weird if an expression with variables gives a completely different result than a expression with constants or literals. In the former case the compiler can not optimize anything. The left associative rule makes both expressions give consistent results.

      • Fair point. However, based on what Eric’s saying, the runtime is allowed to use extra precision, effectively enabling code to generate a numeric result instead of an overflow exception. What if the runtime only uses the extra precision in certain circumstances since it’s allowed, but not enforced? Seems to me an inconsistency is already possible.

        Eric, is the runtime required to use the same precision for all similar operations? Or is it allowed to pick and choose its precision?

        • My guess would be that this behavior is implementation dependant and you can not rely on it being consistent throughout future or past runtime versions. I’d be surprised if, given a specific runtime version, this behavior were to be inconsistent and subject to the runtime’s whim at any given time.

          • You are correct; floating point arithmetic is allowed to be completely inconsistent so long as it is inconsistent in the *more precision than required* direction. It can be inconsistent from moment to moment. And it can in particular be inconsistent between compile-time and run-time computations, and in practice, it is! This is often reported as a bug.

            Why are we in this mess? Because floating point chips internally use 80 bits or larger precision, and the *registers* on the chips maintain that much precision. So if a local double is enregistered, it is automatically promoted to 80 bits. But if there are not enough registers for all the doubles that need to be alive in the same method, 64 bits of storage are allocated on the stack, and the extra bits of precision are lost every time the double leaves the floating point chip and goes back to the stack.

            C# has an undocumented feature whereby an unnecessary cast to double rounds the double off to 64 bits if it is larger. If you need totally repeatable double arithmetic, you can use that trick. But better to simply do your arithmetic in integers if you need it repeatable.

    • The infinities were perhaps a bad example but they were an easy example. I should have given a better example. Even without infinities and zeros plainly it is the case that changing the order in which subexpressions are computed changes the results. It’s easier to see it with addition. big + small + small + small + small… can round very differently than big + (small + small + small + small…).

      • Thanks for both thorough replies; both make sense. (Josh Olson = olsondev, different computer, different cookies :-P) … Interesting I can’t reply to your other comment. I assume it only allows nesting so deep? Odd limitation that.

  4. A case I often see is

    var radians = degrees * Math.PI / 180.0;

    I always write

    var radians = degrees * (Math.PI / 180.0);

    I agree the compiler is doing the right thing. However, I have considered writing a roslyn diagnostic to detect the case. The optimization can be significant in tight mathematical loops.

    • Or something like:
      int millis = days * 24 * 60 * 60 * 1000;

      It’s interesting that the natural direction to write it is exactly opposite the way the associativity works. Of course, few extra integer multiplications have never caused a performance problem for me.

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1956

  6. Pingback: Dew Drop – October 28, 2015 (#2121) | Morning Dew

  7. In your last example, of concatenating strings, what if one of the strings was a nameof expression. Are nameof expressions “const” in the sense that they will be optimized in the same way by the compiler?

    I was writing this question, and realized that I’m just being lazy and could answer it myself. The answer is, yes, nameof expression will be optimized as if it were a literal string. Writing nameof(Main) + “Test” is the same as writing “MainTest”. Cool!

    Being const, nameof expressions can also be used as switch case labels.

  8. Pingback: My readings in 2015 week 44 | My path to become awesome dev

  9. Pingback: Visual Studio – Developer Top Ten for Nov 2nd, 2015 - Dmitry Lyalin

  10. Re “one bizarre case” with zero, there seems to be another with -1:
    int x = int.MinValue;
    int y = checked(x * -1 * -1); /* cannot optimize to checked(x * 1) */

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s