Precedence vs associativity vs order

Raymond has written about this, I have written about Raymond writing about it, but I still frequently get questions from people who are unclear on the difference between precedence, associativity and evaluation order.

I suspect that this confusion arises from the difference between how most people are trained to evaluate arithmetical expressions versus how compilers generate code to evaluate arithmetical expressions. As a child it was drilled into me that the way to evaluate an arithmetical expression was to recursively apply the PEDMAS rule. That is, evaluate anything in parentheses first. Then evaluate any exponentiations. Then divisions, multiplications, additions and subtractions, in that order. So if you had 4 x 5 x (20 - 12 / 3) you start by evaluating what’s in parentheses: 20 - 12 / 3. In there, there are no parens or exponents, so start with the division. Replace 12 / 3 with 4 to get 20 - 4. Then evaluate the subtraction to get 16. Now we have the value for the parens, and we are down to 4 x 5 x 16. Evaluate one of the multiplications — but wait, we do not know what order to evaluate the multiplications in. But we can do it in any old order, so lets say 5 x 16 is 80, so we have 4 x 80, which is 320, done.

You’ll notice that the algorithm that I was taught emphasizes that you do the work on whatever is in the deepest set of parentheses first, no matter what. Many people believe that arithmetical expressions in programming languages work the same way. They do not.

Rather, in programming languages, parentheses indicate how the results of evaluations are combined, but not necessarily the order in which the calculations are carried out. In languages with no side effects, the order of evaluation is irrelevant. But in languages where side effects might occur, order becomes relevant.

The evaluation of an arithmetical expression is controlled by three sets of rules: precedence rules, associativity rules, and order rules.

Precedence rules describe how an underparenthesized expression should be parenthesized when the expression mixes different kinds of operators. For example, multiplication is of higher precedence than addition, so 2 + 3 x 4 is equivalent to 2 + (3 x 4), not (2 + 3) x 4.

Associativity rules describe how an underparenthesized expression should be parenthesized when the expression has a bunch of the same kind of operator. For example, addition is associative from left to right, so a + b + c is equivalent to (a + b) + c, not a + (b + c). In ordinary arithmetic, these two expressions always give the same result; in computer arithmetic, they do not necessarily. (As an exercise can you find values for a, b, c such that (a + b) + c is unequal to a + (b + c) in C#?)

Now the confusing one.

Order of evaluation rules describe the order in which each operand in an expression is evaluated. The parentheses just describe how the results are grouped together; “do the parentheses first” is not a rule of C#. Rather, the rule in C# is “evaluate each subexpression strictly left to right”.

The expression F() + G() * H() is equivalent to F() + (G() * H()), but C# does NOT evaluate G() * H() before F(). Rather, this is equivalent to:

temp1 = F(); 
temp2 = G(); 
temp3 = H(); 
temp4 = temp2 * temp3; 
result = temp1 + temp4;

Another way to look at it is that the rule in C# is not “do the parentheses first”, but rather to parenthesize everything then recursively apply the rule “evaluate the left side, then evaluate the right side, then perform the operation”.

This is not a rule of C++. In C and C++, F(), G() and H() can be evaluated in whatever order the compiler chooses, so long as it combines the results in the right way. A legal C++ compiler might do left to right, right to left, parentheses first, whatever the compiler writer felt like.

The way this topic usually comes up is when someone has an expression chock full of side effects — assignments, increments, decrements, pointer stores and so on, which they are attempting to convert from C++ to C#, and report the “bug” in the C# compiler to me that C# does not follow the “rules” of C++. Which is ironic, because since there are not actually any rules in C++ about order of evaluation between two sequence points. Thus the bug actually is that their code was never portable C++, and only worked because the code author happened to know (or guess) what ordering the compiler writer chose.

Advertisements

6 thoughts on “Precedence vs associativity vs order

  1. Well, for the challenge (“a + (b + c) != (a + b) + c”), couldn’t one just write a custom class with an overloaded + operator that depends on the specific instances of the class involved? Or is the exercise intended to be done using integers or floats, or whatnot?

  2. Pingback: Precedence: ordering or grouping? | Jon Skeet's coding blog

  3. Pingback: C# Associativity math: (a + b) + c != a + (b + c) | Solutions for enthusiast and professional programmers

  4. Pingback: Associativity math: (a + b) + c != a + (b + c) - BlogoSfera

  5. Ok
    * a, b and c could be functions (or properties) which when called in the wrong order throw exceptions (won’t happen in C# though). It is possible to do lots of weird things like these but that’s kinda boring so I’ll focus on cases where only variables (locals, to avoid multithreading problems and such) are involved.
    And as Grant mentioned: It’s possible to write custom operators which do all sorts of things. I’ll ignore those too.
    * a, b and c are of the same signed integer types, checked context:
    a = ThatType.MaxValue; b = 1; c = -1;
    (a + b) + c overflows and throws.
    a + (b + c) returns ThatType.MaxValue.
    * a and b are integer types and c is another integer type which is bigger than the type of a and b:
    a = SmallerType.MaxValue; b = c = 1;
    (a + b) + c throws in a checked context and returns SmallerType.MinValue + 1 in an unchecked context because the two smaller types overflow.
    a + (b + c) returns SmallerType.MaxValue + 2 because the smaller types are promoted.
    Works with signed and unsigned integers.
    * I have not tested this: When floating point numbers overflow it should result in NaN. So when
    a, b and c are floats of the same size and
    a = ThatType.MaxValue; b = 1; c = -1;
    (a + b) + c returns NaN.
    a + (b + c) returns ThatType.MaxValue.

    This pattern of “overflow in one case but not the other” and “silently change types in one case but not the other” keeps showing up.
    I guess you could say in the latter situations the expressions are not exactly (a + b) + c and a + (b + c) because when you spell the whole thing out there are casts involved in different places.

    A similar problem:
    For integers a, b and c:
    (a * b) / c != (a / c) * b when a % c != 0

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s