UPDATE: A commenter points out that today is the 200th anniversary of the birth of George Boole; I had no idea when I scheduled this article that it would be so apropos. Happy birthday George Boole!
Here’s a little-known and seldom-used fact about C# operators: you can apply the &
and |
operators to bool
s, not just to integers. The &
and |
operators on bool
s differ from &&
and ||
in only one way: both operators always “eagerly” evaluate both operands. This is in marked contrast to the “lazily” computed evaluation of the &&
and ||
operators, which only evaluate their right hand argument if needed. Why on earth would you ever want to evaluate the right hand side if you didn’t need to? Why have this operation at all on bool
s?
A few reasons come to mind. First, sometimes you want to do two operations, and know whether both of them succeeded:
bool totalSuccess = First() & Second();
If you want both operations to happen regardless of whether the first succeeded then using &&
would be wrong. (And similarly if you want to know if either succeeded, you’d use |
instead of ||
.)
Though this code is correct, I don’t like it. I don’t like expressions that are useful for their side effects like this; I’d prefer to see one effect per statement:
bool firstSucceeded = First(); bool secondSucceeded = Second(); bool totalSuccess = firstSucceeded & secondSucceeded;
(Also, the original code seems harder to debug; I might want to know when debugging or testing which of the operations succeeded. And of course I am not a super big fan of the “success code” pattern to begin with, but that’s another story.)
But still here we have the &
operator instead of the &&
operator. What’s the compelling benefit of using &
here instead of &&
?
Think about it this way. Suppose you wish to write this code:
bool totalSuccess = firstSucceeded && secondSucceeded; ...
but you don’t get the &&
operator. In fact, all you get is:
-
if
statements of the formif(bool)
where the body is agoto
- non-conditional
goto
statements - assignment of literals to variables and variables to variables.
Well, that’s pretty straightforward:
bool totalSuccess; if (firstSucceeded) goto CONSEQUENCE; totalSuccess = false; goto DONE; CONSEQUENCE: totalSuccess = secondSucceeded; DONE: ...
But this is the situation that C# is actually in; the C# code must be translated into IL, and IL has no &&
instruction. It has conditional branches, unconditional branches, and assignments, so C# generates the IL equivalent of that code every time you use &&
. (And similarly for ||
.)
That’s a lot of code! But there is an IL instruction for &
and |
, so the code generation there is very straightforward and very small.
What are the consequences of the much larger code generation? First of all, the executable is a few bytes larger. Larger code means that less code fits into the processor cache, which means more cache misses at jit time.
The jitter has an optimizer of course, and many optimizers work by analyzing the “basic blocks” of a method. A “basic block” is a section of IL where control flow always enters at the top and always leaves at the bottom; by knowing where all the basic blocks are, the optimizer can analyze the control flow of the method. The & and | operators introduce no additional basic blocks into a method, but the && operator as you can see above introduces two new basic blocks that were not there before, labeled CONSEQUENCE and DONE. Now the jitter has more work to do.
And remember, the jitter has to work fast; it is jitting code in real time here. As method complexity increases, the number of optimizations that can be successfully performed at runtime at reasonable cost decreases. The jitter is entirely within its rights to say “this method either is too long / has too many basic blocks; I’m never going to inline it”, for example. So perhaps the machine code generated is a little worse than it otherwise could have been.
And finally, think about the generated machine code. Again, the code generated from the && version will be larger, which means less program logic fits in the small processor cache, which means more cache evictions. Also, the more branches that are in the code, the more branch prediction the CPU must do, which means more opportunities to predict wrong.
UPDATE: A commenter asks if the C# compiler or jitter can decide to change lazy operators into eager operators if doing so is provably correct and likely faster. Yes, a compiler is allowed to do so; whether the C# or JIT compilers actually do so, I don’t know. I’ll check!
ANOTHER UPDATE: It does! I was unaware of this optimization, and probably should have checked to see if it existed before I wrote this article. 🙂 In C# 6, if the right hand side of an && operation is a local variable then the IL is generated as though it was &. I do not recall having seen this optimization before; perhaps it is new, or perhaps I simply never took a sufficiently close look at the IL generator. (I was aware that if either side of the operator is a compile-time constant true or false then optimizations are performed, but optimizations when operands are known at compile time is a good subject for another day.)
Now, I hasten to point out that these considerations here are the very definition of nano-optimizations. No commercial program ever attributed its widespread acceptance and profitability in the marketplace because a few &
s were used judiciously instead of &&
. The road to performance still demands good engineering discipline rather than random applications of tips and tricks. Still, I think it is useful to realize that avoiding the evaluation of the right hand side might, in some cases, be more expensive than simply doing the evaluation. When generating code to lower nullable arithmetic, for example, the C# compiler will generate eager operations instead of lazy operations.
Is there any reason why the C# compiler couldn’t optimize an && into an & when the RHS has no side effects? (Such as when it’s a local bool variable or field, or really any expression that doesn’t involve any method or delegate calls, property gets, or mutating operators (‘++’, ‘–‘, or anything that ends in ‘=’ (did I miss any kinds of side-effect producing expressions?)))
I would leave that optimization up to the jitter, not to the C# compiler. The optimizer must be able to deduce not just that there is no side effect being omitted, but also that it is a clear performance win on whatever chip is ultimately being targeted.
I thought you made a bunch of excellent points *against* leaving it to the JIT in your original post: the increased code size means more IO before the JIT even comes into play and more cache misses during JITing; the additional basic blocks mean the JIT has more to do; and there’s a cost-benefit tradeoff to JIT-time optimizations that doesn’t exist for compile-time ones, because the cost of the JIT is borne at runtime.
The cases where replacing && with & will be most beneficial are also the cases which are easiest for the JIT to recognize. Further, cases where the C# would suggest the optimization would obviously be helpful may turn out otherwise as a consequence of things like in-line expansion.
Consider function “bool And(bool x, bool y) { return x && y;}” Clearly evaluation of y has no semantic side-effects. On the other hand, short-circuit evaluation of an expression involving And(x, y) would be helpful in the same cases as with an expression involving (x & y) but the JITter might be less likely to spot such optimizations if And(x,y) used the “&” operator.
So is there an obvious time to say, “I should use an eagerly evaluated operator here”? It seems like the sort of thing that won’t be easily noticeable in bench marking, especially if the jitter might sometimes handle it well anyway.
It is important to note that (a & b) can be != to (a && b) in the case that a or b do not just consist of the values 0 or 1. IL does not have a boolean type. It uses an 8 bit quantity which can be filled with any bits at all. And yes, this is reproducible in practice.
I’m not sure how this fact interacts with the C# language. Does the C# spec acknowledge the existence of bools that are not 0 or 1? And if not, who is at fault when such a thing happens and the C# language fails to provide what it promised? Unclear.
C# formally does not have any knowledge of bools that have been forced to have values behind the scenes other than 0 or 1; that is an implementation detail of the runtime. If you take advantage of implementation details, the behaviour is implementation-defined. If something goes wrong when you do something crazy, the fault lies entirely with the person who did the crazy thing.
I’d have expected that use of any value other than 0 and 1 in a boolean value would be a CLS-compliance violation, but no such rule is mentioned in anything I could find in a few minutes of googling. I wonder if there’s a reason that was left out – if it’s somehow outside the scope of what CLS compliance covers, of if there’s some other reason, or if it was just never thought of by the people writing the rules.
The implementation detail is to assume that bools always have the value 0 or 1, and this (incorrect) assumption is made by the authors of the C# compiler. Specifically:
Partition II of ECMA 335 (the Common Language Infrastructure spec) defines a Boolean as a “4-byte integer value where any non-zero value represents TRUE, and 0 represents FALSE”.
ECMA 334 (the C# spec) defines bools and “Boolean logical operators” in terms of the abstract values true and false.
The C# compiler transforms the boolean logical operator& into a bitwise integer “and” without first coercing the bools into 0/1 (or any other consistent integer representation of bool), even if those bools came from a separate assembly that may use some other representation.
This is a bug.
I guess it usually works because most code is compiled with compilers that insist on 0/1. I wouldn’t be hugely surprised if there is at least one security flaw hiding in this hole.
…where’s the bug? Does the C# standard guarantee anywhere that “a & b” is equal to “a && b” if a and b are bools? Unless it does, all that is required is that the outcome of the operation is consistent with the language spec, which will be the case regardless of the values of a and b (since for both operators, the outcome is zero if and only if either operand is zero, and non-zero if and only if both operands are non-zero, which complies with both the IL and C# interpretations of bool). You would have a point if the C# compiler ever emitted instructions that assumed that a “true” bool must have the value 1, but I’m pretty sure it never does.
Come up with an actual scenario where the C# compiler might do something wrong given, say, a bool that’s true because it has the value 2, and I can probably show you a scenario where someone improperly used the type “bool” in C# when they meant “int” (or suchlike), which is no fault of the compiler.
For example, if a and b are both bools and happen to have the integer values 1 and 2 respectively, then according to the CLR spec they are both true. According to the C# spec “true & true” evaluates to true. But the compiler produces a bitwise “and” so you get false. The outcome of the operation is not consistent with the language spec.
Note that you don’t need any borked C# code to get a boolean containing 2. An assembly written in a different language can legitimately return a bool with the value 2 (or 7 or whatever).
@carlos: Oh, I see what you mean now. Doy. I don’t know where I got the idea that “for both operators, the outcome is non-zero if and only if the operands are non-zero”, which is obviously not true for &. You can replicate this within C# itself through type punning: declare a struct with LayoutKind.Explicit, then overlay an int and a bool field — you can coerce C# into producing S.a == true and S.b == true yet S.a & S.b == false. So forget about “optimizing” this at all, since it’s not necessarily a safe optimization.
Thanks for another interesting read – and also, happy 200th birthday to George Boole 🙂
Weird, I just found out about this the other week from an ASP.NET book that had a short overview of logic in both VB.net and C#.
I work in a project where there are a lot of “&&” logic, and yes, probably most of them could be turned into “&” because have no side-effect and are just a variable being read, but we still use “&&” anyway, the reason is, we don’t actually know what the operands are, at the time we are writing the expression they are all properties, but what that property does? I MAY just read a variable, but also, it may lookup for a value in a database, who knows, so the “&&” operator is safer, it may be a few nanoseconds slower in same cases, but the eager operation may be several microseconds slower in other cases, if the JIT is effective in turning lazy into eager when worth, then our problem is solved.
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1960
Would love to hear more on your “success code” pattern teaser.
The success code pattern is a side effect of the fact that structured exception handling sucks so much that some people prefer to not use it at all, in fact, any* current built in error handling sucks, some are totally brain dead.
*But maybe some day someone will create a decent way to do error handling, Apple’s SWIFT is very interesting in this regard, it doesn’t solve every problem, it is not something totally new neither revolutionary, but it is the best collection of simple ideas I saw in this subject so far.
Should it not be right hand side in “In C# 6, if the left hand side of an && operation is a local variable then the IL is generated as though it was &.”? If the left hand side is a local variable it would mean that the right hand side is evaluated in all cases, even if it has side effects. That would violate the lazy evaluation rule. However, taking the value of a local variable on the right side of && or || will never cause a side effect.
Whoops, you are correct of course. I meant the other left side of course.
Pingback: Dew Dump – November 3, 2015 (#2125) | Morning Dew
a little off topic, but are there any good examples of interface use in the released dotnet source code? I didn’t see much under Roslyn (because thats the compiler?)
Personally I’ve used the ‘&’ and ‘|’ operators on bools when I wanted to create code that had constant execution time. One could have an operand on the right side that is almost never evaluated but when it is evaluated brings things to a crawl. I encountered this long ago in a physics library and for testing purposes changed the ‘&&’ to ‘&’ to cause the right side to always be evaluated since it was rare that it was actually needed. Removed the need to create a specific test case outside of code that touched on the general region where the slow method was and allowed a programmer that was a bit more experienced than me to solve it.
My french language compiler chokes on “apropos”. It understands “à propos” better. Hello from a picky french that never used & nor | on booleans 🙂
Regarding these operators when the LHS value means the RHS won’t change the result:
&& and || say that it MUST NOT be evaluated.
& and | say that it MUST be evaluated.
We need (!) a third form to allow the programmer to say “It doesn’t matter if the RHS is run or not”.
That way, the compiler is given the choice to either include a conditional branch or go into the RHS, depending on which is more efficient.
Does Unicode have a half-an-ampersand character?
Pingback: Visual Studio – Developer Top Ten for Nov 5th, 2015 - Dmitry Lyalin
Pingback: Automate the Planet
Pingback: Les liens de la semaine – Édition #157 | French Coding
Pingback: My readings in 2015 week 46 | My path to become awesome dev