Nullable micro-optimization, part three

Happy New Year all; I hope you had as pleasant a New Year's Eve as I did.

Last time on FAIC I described how the C# compiler first uses overload resolution to find the unique best lifted operator, and then uses a small optimization to safely replace a call to Value with a call to GetValueOrDefault(). The jitter can then generate code that is both smaller and faster. But that's not the only optimization the compiler can perform, not by far. To illustrate, let's take a look at the code you might generate for a binary operator, say, the addition of two expressions of type int?, x and y:

int? z = x + y;

Last time we only talked about unary operators, but binary operators are a straightforward extension. We have to make two temporaries, so as to ensure that side effects are executed only once: 1

int? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

A brief aside: shouldn't that be temp1.HasValue && temp2.HasValue? Both versions give the same result; is the short circuiting one more efficient? Not necessarily! AND-ing together two bools is extremely fast, possibly faster than doing an extra conditional branch to avoid what is going to be an extremely fast property lookup. And the code is certainly smaller. Roslyn uses non-short-circuiting AND, and I seem to recall that the earlier compilers do as well.

Anyway, when you do a lifted addition of two nullable integers, that's the code that the compiler generates when it knows nothing about either operand. Suppose however that you added an expression q of type int? and an expression r of type int 2:

int? s = q + r;

OK, reason like the compiler here. First off, the compiler has to determine what the addition operator means, so it uses overload resolution and discovers that the unique best applicable operator is the lifted integer addition operator. Therefore both operands have to be converted to the operand type expected by the lifted operator, int?. So immediately we have determined that this means:

int? s = q + (int?)r;

Which of course is equivalent to

int? s = q + new int?(r);

And now we have an addition of two nullable integers. We already know how to do that, so the compiler generates:

int? s;
int? temp1 = q;
int? temp2 = new int?(r);
s = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

And of course you are saying to yourself well that's stupid. You and I both know that temp2.HasValue is always going to be true, and that temp2.GetValueOrDefault() is always going to be whatever value r had when the temporary was built. The compiler can optimize this to:

int? s;
int? temp1 = q;
int temp2 = r;
s = temp1.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2) :
  new int?();

Just because the conversion from int to int? is required by the language specification does not mean that the compiler actually has to generate code that does it; rather, all the compiler has to do is generate code that produces the correct results! 3

Next time on FAIC: What happens when we throw some lifted conversions into the mix?

  1. More specifically, the compiler must ensure that side effects are executed exactly once.
  2. Roslyn will also optimize lifted binary operator expressions where both sides are known to be null, where one side is known to be null, and where both sides are known to be non-null. Since these scenarios are rare in user-written code, I'm not going to discuss them much.
  3. A fun fact is that the Roslyn compiler's nullable arithmetic optimizer actually optimizes it to temp1.HasValue & true ? ..., and then Roslyn's regular Boolean arithmetic optimizer gets rid of the unnecessary operator. It was easier to write the code that way than to be super clever in the nullable optimizer.

30 thoughts on “Nullable micro-optimization, part three

    • Thanks for the suggestion. I am new at running a wordpress blog and the array of available plugins is somewhat bewildering. I use a lot of footnotes, so I'll check that out!

      I'll also probably install a "markdown in comments" plugin at some point.

  1. "AND-ing together two bools is extremely fast"

    But it might give a wrong result! Is HasValue guaranteed to return either (bool)0 or (bool)1? If it sometimes returns (bool)2 the AND might produce a false negative.

    • You can't cast ints to bools like that in C#:
      > (bool)2
      (1,1): error CS0030: Cannot convert type 'int' to 'bool'

      I think it's guaranteed that '&' will work on bools as expected.

      • A bool has an integer representation (I know that from Microsoft Pex - Pex can actually generate bools that are not 0 or 1 and cause the program under test to fail). The CLR allows non-0-or-1 booleans.

        What about this?:
        public static unsafe bool ByteToBoolean(byte b)
        return (bool) *&b;

        ByteToBoolean(1) & ByteToBoolean(2)

        Will return false!

        • Oh, very interesting, I didn't know that! In your example case, though, you're running unsafe code--I assume everything with the nullable types is safe. When you're in an unsafe context things do get a bit trickier. Perhaps a better way to put it would be "in a safe context, '&' on bools works as expected"? Or are there times even then where you can get this behavior?

          • What this method does on the "inside" is an implementation detail. Unsafe code or not does not matter. The CLR provides the same facility - you can convert any byte losslessly to a boolean. This is perfectly defined and deterministic.

            "Mixed" booleans are a perfectly valid element of the CLR. They are allowed to occur anywhere.

          • (Scratch the above, apparently the IL emitted is the same for integers and booleans. This might actually be a genuine bug; or one could argue C# booleans and CLR booleans aren't the same concept, although I thought this sort of interoperability would be dealt with somewhere.)

          • The C# Reference and Specification both say the only values a bool can have are false and true. If all bools have values of false or true, bitwise AND and non-short-circuiting logical AND are the same thing. I think the spec simply does not address this. It sounds suspiciously like what C calls a trap representation. In C, when reading a trap representation (such as, on some systems, a byte with a value of 2 through an lvalue of type _Bool), the behaviour is undefined, any behaviour is permitted. Even though the value 2 is a valid value for any 8-bit register that the generated machine code might use, the compiler is allowed to assume that the value isn't 2, isn't 3, isn't any larger value than that. So, for example, b > true might evaluate to false at compile time, yet so would b < true or b == true at run time.

            As a practical data point, C#'s & when used with two bools compiles to a bitwise AND on my system, meaning (bool)1 & (bool)2 evaluates to false, and (bool)2 & (bool)2 evaluates to (bool)2, even though your comment suggests the result should be the same for both.

  2. This reminds me of something I've always wondered-
    Why does the C# compiler not do any inlining itself? Why leave it all to the JIT? It seems like you should do as many optimizations at compile time as possible.

    • I imagine Eric could give a better reply; but my guess would be that a fair amount of the information that guides whether to inline or not is only available at JIT time.

      Inlining takes up more code space (possibly lowering performance, due to cache misses), removes the speed penalty of a call/return, and potentially saves pushing function arguments onto the stack.

      Until runtime, you don't know how much extra code space inlining will use (depends on the platform and/or CPU in question). You also don't know how many registers you have available, and hence whether you actually *can* save the time of pushing function arguments onto the stack (if inlining causes you to run out of registers, then you're going to have to spill some onto the stack in any case). Some architectures will take longer for a call/return which could affect your decision on whether to inline.

      JIT time is the point where you have all the information you need on whether to inline. At compile time you don't, so the safest option is to leave it to the JIT.

        • I understand why some inlining can only be done by the JIT, but do those considerations really come into play in these single statement cases, such as GetValueOrDefault()?

          To me, it seems like a reasonable assumption that single statement methods always make more sense to be inlined. Thus, it is something the C# compiler could do. The compiler is essentially doing a form of inlining when it performs the rewrites you detail in your post Eric (as opposed to generated a reusable method and calling that each time).

          • Eric: thanks! ;)

            Sam: I can imagine two reasons why the C# compiler might not inline 'simple' methods (single statement might not be the best way to describe them, a single statement could be pretty complex!);

            1) If the statement is really that simple, then the JITter isn't going to spend much time analysing it, so you're not saving much JIT time by analysing it in advance; why bother special casing it? It complicates the compiler to optimise only simple cases that weren't causing speed problems anyway.

            2) It removes the ability to do things like set a breakpoint on the method in question - because it effectively doesn't exist in the IL if it's been inlined everywhere...! I think (not sure?) that the JIT normally avoids inlining code if there's a debugger attached, for that reason.

          • Inlining a method also removes the ability to replace some, but not all, dlls in a compiled application (i.e., makes full compiles necessary). Though if you "know" they won't change (e.g., constants)...

      • Although there are many cases where only the JIT can know whether to inline something, I would think that having the compiler inline things like struct property getters would allow it to eliminate many redundant copy operations, especially when they are invoked in read-only contexts.

        For example, if one is enumerating a Dictionary(of Guid, Rectangle) and has a KeyValuePair(of Guid, Rectangle) called kvp, and if the compiler doesn't analyze property getters, accessing kvp.Value.X will require making a copy of all 32 bytes of kvp, passing a byref to a method which fetches 16 of them, storing those to another temporary stack spot, and passing a byref to a routine that fetches four bytes. Even if the methods are inlined, one is still stuck with the overhead of the copy operations. Even though the purpose of the code is simply to fetch four bytes from kvp, it has to make a slow copy of kvp, then a somewhat faster (but still RAM-based) copy of kvp.Value, before it can finally read the four bytes it was actually interested in.

        I don't think any realistic level of JIT inlining could yield anything close to the obvious optimization of simply reading the four bytes directly from the original struct, but I would think a compiler that could recognize trivial struct property getters could do so.

    • Inlining at compile time can't work across assemblies, since you don't know what code that other assembly might contain at runtime.

  3. "so you’re not saving much JIT time by analysing it in advance; why bother special casing it?" - it's a micro-optimization, just like the compiler using GetValueOrDefault() is a micro-optimization. Compilers should micro-optimize anywhere they can, because when you add those micro-optimizations up over millions/billions lines of code across the globe, you're saving real time/energy/karma.

    As with all compiler optimizations, it wouldn't be present if you build in Debug mode. As it stands, the compiler can already do a significant amount of code rewriting when optimizations are enabled (for example, removing unused variables), making some breakpoints in Release mode code impossible.

  4. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1266

    • That's a great question that I am not going to explore in depth in this series. Briefly, the problem is that determining when an expression either *produces* or *consumes* a side effect can be quite tricky. For example, reading a local variable never produces a side effect, but another expression might *write* to a local variable as a side effect, and therefore the read must not be re-ordered with respect to the write. The Roslyn and original recipe compilers treat constants as expressions that do not need to be stored in temporaries; pretty much everything else is put into some kind of temporary.

  5. Pingback: Nullable micro-optimization, part four | Fabulous Adventures In Coding

  6. There really was rather little need to be uncertain about what the classic compiler did. You may be familiar with Joseph Albahari's excellent Linqpad utility. Prior to Roslyn's REPL it was the best way to to test a C# expression or fragment without creating a whole visual studio project, or calling the compiler by hand.

    One of the nice things it does is provides the disassembly of the method(s) you create. So to test this one, I simply switched to "C# program mode" and typed in a simple method whose body was "return a+b", and whose parameters and return value had type 'int?'. Then I click run (which ran the empty Main method) and looked at the disassembly for the method.

    Sure enough it the classic compiler does use bitwise and as you predicted. The whole thing took less than a minute to check. It literally took me several times as long to write this post as it did to check that. Hope this helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>