Nullable comparisons are weird

One of the C# oddities I noted in my recent article was that I find it odd that creating a numeric type with less-than, greater-than, and similar operators requires implementing a lot of redundant methods, methods whose values could be deduced by simply implementing a comparator. For an example of such, see my previous series of articles on implementing math from scratch.

A number of people commented that my scheme does not work in a world with nullable arithmetic (or, similarly, NaN semantics, which are similar enough that I’m not going to call out the subtle differences here.) That reminded me that I’d been intending for some time to point out that when it comes to comparison operators, nullable arithmetic is deeply weird. Check out this little program:

using System;
using System.Collections.Generic;
class P
{
  static void Main()
  {
    int? one = 1;
    int? nll = null;
    var comparer = Comparer<int?>.Default;

    Console.WriteLine(one < nll);  // false
    Console.WriteLine(one > nll);  // false
    Console.WriteLine(one <= nll); // false
    Console.WriteLine(one >= nll); // false
    Console.WriteLine(one == nll); // false
    Console.WriteLine(one != nll); // true

It is possible for two things to be unequal, but neither is greater than or less than the other. Nulls are unordered with respect to non-nulls.


    Console.WriteLine(nll < nll);  // false
    Console.WriteLine(nll > nll);  // false
    Console.WriteLine(nll <= nll); // false
    Console.WriteLine(nll >= nll); // false
    Console.WriteLine(nll == nll); // true
    Console.WriteLine(nll != nll); // false

Two things can be equal to each other, but not greater-than-or-equal to each other! Apparently greater-than-or-equal does not actually have “or” semantics.


    Console.WriteLine(comparer.Compare(one, nll)); // one is greater
  } 
}

And finally, the default comparer contradicts the greater-than and less-than operators. It says that one is greater than null, but the operators say that neither is the greater.

Basically the rule here is that nulls compare for equality normally, but any other comparison results in false. This is a bit weird, but it does make some sense. The language designers have a number of possible choices to make.

  • Make comparison operators produce nullable bool. If there is a null operand, the result is null, not false. Note that this implies that x == null should always produce null, contrary to the expectation of users familiar with reference null comparisons.
  • Make comparison operators produce bool, and say that greater-than-or-equal comparisons to null have the same semantics as “or-ing” together the greater-than and equals operations. Now we must consider whether the user who is doing a comparison involving nullables is really asking the question “is it the case that these two things (a) have equal values, or (b) have value x greater than value y, or (c) neither has a value?”. That does not seem like the semantics that a user has in their head when comparing two potentially null integers. If the question is “are the sales for March greater than or equal to the sales for April? and both figures are unknown, it seems wrong to say “yes, they are equally unknown.”
  • Make comparison operators produce a bool and apply a total ordering; null is smaller than everything other than itself, and greater than nothing.

Good heavens, what a mess. The latter is what you want for sorting; as we have discussed before, a sort comparison must produce a consistent total ordering, and that is what the default comparer does. The built-in operators do not provide a total ordering. Rather, they try to make a best-effort attempt at a reasonable compromise. The operators produce bools, equality semantics work as people expect, and non-equality comparisons involving null produce false, indicating that we do not have information about how the quantities are ordered.

What is the practical upshot here? First, be careful when comparing nullable numbers. Second, if you need to sort a bunch of possibly-null values, you cannot simply use the comparison operators. Instead you should use the default comparison object, as it produces a consistent total order with null at the bottom.

Advertisements

39 thoughts on “Nullable comparisons are weird

  1. I have found the built-in comparison operators to be perfectly designed. Exactly the right choices.

    I guess when SQL was designed they did not have the experience and wisdom to make equally sane choices.

    • The SQL (actually, relational algebra) choice is perfectly sane, if you think about how a relational database is used.

      Databases store facts. Like, for example, that Alice Smith was born on 1 Jan 1970, lives in New York, has 1 pet, and doesn’t have a personal motto. In a database, NULL means that the data is unknown. This is why NULL in a database is fundamentally different from 0 or the empty string. NULL would mean we don’t know how many pets she has, or what her motto is, which is different from knowing that she has 0 pets, or that she has no motto.

      Querying a database is asking questions about the facts in the database. Does Alice have fewer than 3 pets? In this case, yes. However, if the number of pets was NULL, the answer to that question is “we don’t know” – i.e. NULL.

      If Alice and Bob both have NULL pets, do they have the same number of pets? We don’t know, so comparing two NULLs for equality returns NULL. Do they have a different number of pets? Again, we don’t know – NULL.

      You may not like the way relational algebra works, but to label its design as lacking in wisdom or sanity seems to indicate a lack of appreciation for the problems it was invented to address.

      • I understand the rules and the reason they were invented. It’s just that nobody wants this behavior in practice. People want NULL to be just another value. That’s what the SQL designers did not foresee.

        It sometimes is super convenient to be able to put just one more special value into a variable than the normal value domain would allow for. That’s really all that NULL should be. One more value.

        • Ah. It’s just been my experience that most of the people who want this behaviour in practice are those who have never been taught the rules or the reason they were invented. Rather, they’ve seen the word “null”, assumed that it meant the same thing as the “null” from their favourite other programming language, and then got annoyed and defensive when it doesn’t work like it “should”. They then look up what the rules are, but without actually try to learn *why* they are what they are, simply consider them stupid (or insane).

          Looks like I unfairly pigeonholed you then – my apologies.

        • I understand ternary logic, I find it perfect. Unlike most of my colleagues who understand it less. That is why its use is in our company banned. It’s a pity.

      • I know the 3-way logic used in SQL (Server), but really, it is garbage, it is not what people expect and is inconsistent with itself.

        And you forgot one use for NULL, EMPTY, yes, NULL is also used to represent EMPTY on 1 to 0-or-1 relationships.

        And now, the inconsistencies, NULL = NULL returns false, but EXISTS(SELECT NULL INTERSECT SELECT NULL) returns true, GROUP BY, ORDER BY assumes NULL is comparable, etc. It is just the (in)equality to NULL that is useless.

        • I’m OK with NULL representing EMPTY (“there is no value here”), and the behaviour of ORDER BY (what else could you do, randomly interleave the NULLs throughout the other results?), but yes, the behaviour of INTERSECT and GROUP BY is clearly inconsistent with the intended semantics of NULL. However, IMHO it should be the behaviour of INTERSECT and GROUP BY that should change for consistency’s sake, rather than the behaviour of (in)equality. *shrug*

          • No, if NULL = NULL where true on SQL JOINs would still work, NULLs work this way in SQL because of a theory says this is the most correct behavior, the problem is that behavior hardly ever matches real life requirements, this mismatch is so severe that a new operator is being added, C# nullable behavior is much more desirable to solve real life problems (how many times you fell pissed of by comparing two int?s null and that returning true? Let me guess, never?), it is more or less like the value of 0 power 0, there is (or better, there isn’t) a correct response, and there is a response that solves you problem.

    • I’m glad that you agree that the C# designers made good choices, but I cannot conclude that the designers of SQL were inexperienced, foolish or insane to make the choices they did.

      I am not a SQL expert and would not care to make an exact characterization of the differences between C# and SQL null semantics, but I trust that the designers of SQL went through a similar process as the designers of C# did: carefully weighing many possibilities against a mix of mutually incompatible design goals, and trying to come up with a reasonable compromise position.

  2. > Make comparison operators produce nullable bool. If there is a null operand, the result is null, not false. Note that this implies that x == null should always produce null, contrary to the expectation of users familiar with reference null comparisons.

    I don’t think it would have been too unreasonable for nulls to propagate through general comparison instructions (same as arithmetic), but to have an exception for == and != to make comparing against null work as expected. The falsy implementation also has == and != acting inconsistently with the comparison operators, so that’s not really a point against the null-propagating version.

  3. > Note that this implies that x == null should always produce null,
    > contrary to the expectation of users familiar with reference null
    > comparisons.

    Well, maybe I’m missing something but I believe we would then need a new construct for “if” (ifnull? if/else/null?) and maybe other keywords.
    I mean, if all comparisons to null produce null (“x==null”, “x!=null”, etc.), how could we test whether a reference is null (and then invoke/dispose, etc.)?

      • Wow, indeed, shame on me!
        I don’t know why I was so sure “((T)null is T) == true”.
        Good thing I haven’t coded in C# for a long time!

    • > if all comparisons to null produce null, how could we test whether a reference is null?

      Object.ReferenceEquals(x, null) would fit the bill. I sometimes do that anyway, if someone didn’t do proper null checking when implementing == or Equals.

  4. And yet another possibility, weasel out, make nullables not comparable; if you want to compare them let the user explicitly handle the null case. This leads to more effort, but fewer surprises.

  5. IEEE NaN semantics are even weirder (but almost 100% sensible):

    x NaN == false //or an exception
    (x == NaN) == false //or an exception
    (x != NaN) == false //or an exception
    (NaN == NaN) == false //or an exception
    (NaN != NaN) == false //or an exception
    minNum(x, NaN) == x //sic!
    maxNum(x, NaN) == x //sic!

    And of course there’s a separate totalOrder() function which is similar to CompareTo(), except it distinguishes between signed zeros and non-canonical representations of the same number.

    However, Prof. W. Kahan, one of the authors of the standard, now thinks the specification of minNum and maxNum is not the best one, and should be NaN-propagating unless the other operand is a positive/negative infinity for maxNum/minNum respectively.

    • Whoops, made a mistake. Shows how complex 754-2008 is.

      (x != NaN) == true //or an exception
      (NaN != NaN) == true //or an exception

      compareNotEqual is true if its operands are unordered (and NaN is unordered with anything, including itself).

  6. Pingback: Dew Drop – September 1, 2015 (#2081) | Morning Dew

  7. Pingback: Visual Studio – Developer Top Ten for September 4th, 2015 - Dmitry Lyalin

  8. Pingback: Szumma #006 – 2015 36. hét | d/fuel

  9. Pingback: Automate the Planet

  10. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1936 – September Catchup

  11. Pingback: Automate the Planet

  12. Eric ! Help !
    Not related to this article, but I don’t know where to ask.
    I need to get your remarkable ‘Benchmarking mistakes’ series, but all links lead to http://tech.pro/ site which is deactivated now 😦
    Where to get ‘Benchmarking mistakes’ series ?

  13. I disagree with this point: “If the question is “are the sales for March greater than or equal to the sales for April? and both figures are unknown, it seems wrong to say “yes, they are equally unknown.”” when you’re arguing against complete-OR semantics for because that argument applies just as well to equality (==), not just comparisons. For example: it’s just as nonsense to say “are the sales for March equal to the sales for April?” when both are null. I think that, if your language has (null == null) == true then (null >= null) must equal true as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s