Nullable micro-optimizations, part one

Which is faster, Nullable<T>.Value or Nullable<T>.GetValueOrDefault()?

Before I answer that question, my standard response to “which horse is faster?” questions applies. Read that first.

.
.
.

Welcome back. But again, before I answer the question I need to point out that the potential performance difference between these two mechanisms for obtaining the non-nullable value of a nullable value type is a consequence of the fact that these two mechanisms are not semantically equivalent. The former may legally only be called if you are sure that the nullable value is non-null;[1. Put another way, calling Value without knowing that HasValue is true is a boneheaded exception.] the latter may be called on any nullable value. A glance at a simplified version of the source code illustrates the difference.

struct Nullable<T> where T : struct
{
  private bool hasValue;
  private T value;
  public Nullable(T value)
  {
    this.hasValue = true;
    this.value = value;
  }
  public bool HasValue { get { return this.hasValue; } }
  public T Value
  {
    get
    {
      if (!this.HasValue) throw something;
      return this.value;
    }
  }
  public T GetValueOrDefault() 
  {
    return this.value; 
  }
  ... and then all the other conversion gear and so on ...
}

The first thing to notice is that a nullable value type’s ability to represent a “null” integer or decimal or whatever is not magical.[2. Nullable value types are magical in other ways; for example, there's no way to write your own struct that has the strange boxing behaviour of a nullable value type; an int? boxes to either an int or null, never to a boxed int?.] A nullable value type is nothing more than an instance of the value type plus a bool saying whether it’s null or not.

If a variable of nullable value type is initialized with the default constructor then the hasValue field will be its default value, false, and the value field will be default(T). If it is initialized with the declared constructor then of course the hasValue field is true and the value field is any legal value, including possibly T‘s default value. Thus, the implementation of GetValueOrDefault() need not check the flag; if the flag is true then the value field is set correctly, and if it is false, then it is set to the default value of T.

Looking at the code it should be clear that Value is almost certainly not faster than GetValueOrDefault() because obviously the former does exactly the same work as the latter in the success case, plus the additional work of the flag check. Moreover, because GetValueOrDefault() is so brain-dead simple, the jitter is highly likely to perform an inlining optimization.[3. An inlining optimization is where the jitter eliminates an unnecessary "call" and "return" instruction by simply generating the code of the method body "inline" in the caller. This is a great optimization because doing so can make code both smaller and faster in some cases, though it does make it harder to debug because the debugger has no good way to generate breakpoints inside the inlined method.] How the jitter chooses to inline or not is an implementation detail, but it is reasonable to assume that it is less likely to perform an inlining optimization on code that contains more than one “basic block”[4. A "basic block" is a region of code where you know that the code will execute from the top of the block to the bottom without any "normal" branches in or out of the middle of the block. (A basic block may of course have exceptions thrown out of it.) Many optimizing compilers use "basic blocks" as an abstraction because it abstracts away the unnecessary details of what the block actually does, and treats it solely as a node in a flow control graph.] and explicitly throws.

It should also be clear that though the relative performance difference might be large, the absolute difference is small. A call, field fetch, conditional jump and return in the typical case makes up the difference, and those things are each only nanoseconds.

Now, this is of course not to say that you should willy-nilly change all your calls to Value to GetValueOrDefault() for performance reasons. Read my rant again if you have the urge to do that! Don’t go changing working, debugged, tested code in order to obtain a performance benefit that is (1) highly unlikely to be a real bottleneck, and (2) highly unlikely to be your worst performance problem.

And besides, using Value has the nice property that if you have made a mistake and fetched the value of a null, you’ll get an exception that informs you of where your bug is! Code that draws attention to its faults is a good thing.[5. Note that here we have one of those rare cases where the frameworks design guidelines have been deliberately bent. We have a "Get" method is actually faster than a property getter, and the property getter throws! Normally you expect the opposite: the "Get" method is usually the one that is slow and can throw, and the property is the one that is fast and never throws. Though this is somewhat unfortunate, remember, the design guidelines are our servants, not our masters, and they are guidelines, not rules.]


Next time on FAIC: How does the C# compiler use its knowledge of the facts discussed today to your advantage? Have a great Christmas everyone; we’ll pick up this subject again in a week.

About these ads

21 thoughts on “Nullable micro-optimizations, part one

  1. When you previewed this the other day, I believe a commentator pointed out why GetValueOrDefault would be faster*, and it was obvious. But prior to that preview, comment, and now this blog post, I still sort of expected the method to do the HasValue check and then return default(T), which (as this blog illustrates) would simply be more work than necessary** to return a value the struct already has.***

    *Maybe it adds up in a loop over a million iterations to something actually noticeable.
    **But maybe it adds clarity?
    ***Astericks are fun.

    • On older machines, if T was large struct type, a zero-fill would be significantly faster than a block copy; that difference could outweigh the time required to perform the test, at least in cases where the value would often be null. Today the difference is apt to be negligible even with a large struct (16Kbytes). Still, it’s worth noting that it’s possible for a Nullable<T> to exist (e.g. created by having one thread copy a variable while another thread writes it) such that HasValue is false, but GetValueOrDefault() will consistently return something other than default(T). The fact that simultaneous reads and writes to a large value type may cause partially-updated values to be read is not surprising; what is perhaps surprising is that a method called GetValueOrDefault does not validate HasValue and force its return to default(T) if it’s false.

      BTW, benchmarking an unconditional direct return of a field value compared with a conditional form, using a large structure type, it seems that a conditional version is much slower (almost 2x) than an unconditional version if written as a function; if written as a method with an `out` parameter,the method seems to be almost twice as fast as a function that returns T, and the speed is about the same with or without the if-test.

      • You’re suggesting making the common, normal scenario (no “tearing” caused by incorrect threading) significantly slower so as to hide a threading bug more effectively in a rare scenario? That proposal would not get far if you made it to the BCL team.

        • If I were in charge of .net, I would want GetValueOrDefault unconditionally *behave as documented*. Given the performance cost of having it test default(T), it would seem reasonable to make documentation and implementation conform by documenting that a corrupted Nullable<T> may return a non-default value even when HasValue is false; any code which for security or system-integrity purposes would require that the “Value” field be ignored on any instance where HasValue is false must perform its own test on HasValue and act accordingly to enforce the desired semantics because the type itself won’t do so.

          BTW, I don’t think I’d advocate using Nullable<T> on any code where performance is a particular concern, using a struct with exposed fields “IsValid” and “Value” would almost certainly allow for better performance (among other things, it would in many cases avoid having to copy an entire T every time one wants to access a member thereof).

          • You know the documentation of Nullable actually says “Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.”?

            AFAIK it’s behaving as documented.

        • Saying “struct1=struct2″ doesn’t use any member of the struct type. To be sure, if the size of the struct is something other than 8, 16, or 32 bits, one should expect that copying the struct on one thread while writing to it in another may yield a copy with any combination of old and new data. In most cases, however, if the readable properties of the copy return certain values, that copy will be the same as any other copy whose properties return those same values. I would suggest that any struct type which has state which isn’t exposed by properties should, at minimum, document that fact.

          • The language documentation is already reasonably clear on the point that assignments of all but a few specific types (i.e. integral types 64 bits or smaller) are not thread-safe either.

            There’s nothing untoward about Nullable’s thread-safety here. It works as documented.

          • The spec actually guarantees only that 32 bit integer and float reads and writes are atomic, and then only if they are aligned. An implementation on a 64 bit machine might offer a guarantee of atomic 64 bit integer and float reads, but it is not required to do so.

  2. Perhaps worth pointing out that the reason there is a GetValueOrDefault() method rather than a ValueOrDefault property is because there is an overload that allows you to specify your own value to use as a default. And this overload *does* need to do more work than the parameterless overload.

    But your point about methods versus property stands: it would be awkward to have a ValueOrDefault property along with a GetValueOrDefault(T) method.

  3. “A nullable value type is nothing more than an instance of the value type plus a bool saying whether it’s null or not.” Should this not be ‘a bool saying whether it HAS A VALUE or not’? It’s just that null is a valid value for a reference type and distinct to having not been assigned.

    • You’re right that null is a valid value for a reference type, but Nullable<T> does not allow T to be a nullable type. You cannot have Nullable<object>, for example, nor can you have Nullable<Nullable<int>> and if you could, it would not be clear what the value of ((Nullable<object>)null).HasValue should be. For Nullable<T>, being null and having HasValue as false are the same thing.

      (I wonder how this blog deals with comments that look like HTML. I’ve used the HTML less-than and greater-than entities to be on the safe side.)

  4. So now that we understand how it works enough to know which one SHOULD be faster, has anybody figured out which one actually IS faster?

  5. I… Well… I have a question
    In the last project I worked there was
    if (nullableVariable.HasValue)nullableVariable.Value.Fun()

    all over the code, including code generated by an in-house tool, and the questions:
    1) Is the framework able to remove the redundant check?
    2) Should we had replaced the .Value by .GetValueOrDefault()? I mean… The .Value looks more pretty, readable, semantically close to what we want, the performance shouldn’t vary much since usually, nullable values are used before or after a much more expensive operation (database command) and, we were expecting the framework to be able to remove the redundat check, didin’t look so complicated… Yes, I read your previous article, but there is the code generation tool there and I can’t really prevent if won’t be used on something where the performance difference will be visible, also, the auto-generated code isn’t supposed to be read, anyway, as it is, there is a redundant check, I would replace, after asking other’s opinions…

    • If the performance difference is “visible”, then you’ll notice it. At that point, you can establish whether calling GetValueOrDefault() makes a significant improvement.

      Whether the JIT compiler can remove the redundant check depends on the implementation. The main determining factor is probably whether the Value property getter gets inlined or not.

      Even if it can’t remove it now, it might some day in the future. And even if it removes it now, you never know when some more important issue forces it to stop.

      But the likelihood of it mattering at all is very low. You’d probably have to have a program that does nothing other than retrieving values from Nullable instances to even be able to measure the difference, never mind for it to affect throughput or perceived response time.

    • In your case calling GetValueOrDefault looks like bad idea – because if you call the Fun() method on a null value, you’re going to get a NullReferenceException. By checking HasValue first, you are ensuring you do not try to call a method on a null reference.
      Or I guess you could be asking if you should use the “faster” GetValueOrDefault while keeping the HasValue check, so the program doesn’t check it again within the Value getter. In which case I’d agree that unless you actually have a performance problem that can be traced to this code, I wouldn’t worry about it.

  6. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1259

  7. ///Replaced all angle brackets with square brackets.
    Sorry, but I really hate the way Nullable[T] is implemented in the compiler.
    Instead of investing some more resources into making real lifted operators possible, the team implemented Nullable’s as a one-trick compiler hack that has to be handled in many places in the framework.
    Nullable[T] looks like a simple class, yet nobody can ever create something like it in it’s current implementation. The look-but-don’t-touch magic is secured behind the glass.
    But it would have been possible if the real lifted operators were created.
    There are two problems that prevent developers to implement their own Nullable[T]-like operators:
    1) You need to be able to tell the framework that you support the operation for all eligible types.
    2) You need to be able to perform the base operation on the eligible types.

    This cannot be done without the help of the compiler (but can be implemented without the CLR modification).

    Before implementing the lifted operators we first need to allow generic parameters in operator definition (https://connect.microsoft.com/VisualStudio/feedback/details/687593/c-compiler-doesnt-see-operators-that-introduce-their-own-generic-type-parameters):
    public static MyClass[TResult] operator +[T1, T2, TResult](MyClass[T1] a, MyClass[T2] b) { … }
    The operator binding rules need to be extended to allow binding to such operators.

    I see two ways to implement the lifted operators:

    a) Introduce the “lifted” keyword which can be specified berfore the “operator” keyword. Lifted operator functions receive (in addition to the arguments) the reference to the function that handles the base operation.
    Here is an example of a lifted addition operator defined in the MyClass[T] class:

    public static MyClass[TResult] lifted operator +[T1, T2, TResult](MyClass[T1] a, MyClass[T2] b, Func[T1, T2, TResult] op) {
    return new MyClass[TResult](op(a._value, b._value);
    }

    The operator binding rules are modified as follows:
    In addition to the generic operator binding argument type requirements, there must exist an operator with the “T1 op T2 -] TResult” signature. (dufferent operators have different signatures)
    If the requirements are fulfilled, the operator invokation is compiled as a op_Addition(a, b, op) call where the “op” is the reference to the base operation.
    So, the
    MyClass[Vector2] a;
    MyClass[Vector2] b;
    MyClass[Vector2] c = a + b;
    is compiled as
    c = MyClass[].op_Addition(a, b, Vector2.op_Addition);

    I guess this would require some non-trivial work to optimize this code to the level of an explicit implementations.
    But this would open a wonderful new possibilities and allow developers to create their ovn monads.

    b) Extend the list of type constraints (which are nothing more than a crude static interfaces) to allow requiring the presence of specific operators (They are called TYPE constraints, not INSTANCE constraints for a reason).

    public static MyClass[TResult] operator +[T1, T2, TResult](MyClass[T1] a, MyClass[T2] b) where {T1 + T2 -] TResult}
    {
    return new MyClass[TResult](a._value + b._value);
    }

    Sadly, I understand that this will never be done (which is obvious given how many times it was suggested, rationed, asked and begged for).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s