What is the type of the null literal?

The C# 2.0 specification says

The null literal evaluates to the null value, which is used to denote a reference not pointing at any object or array, or the absence of a value. The null type has a single value, which is the null value.

But every version of the specification since then does not contain this language. So what then is the type of the null literal expression?

It doesn’t have one; the specification never says what the type of a null literal. It says that a null literal can be converted to any reference type, pointer type, or nullable value type, but on its own, considered outside of the context which performs that conversion, it has no type.

When Mads and I were sorting out the exact wording of various parts of the specification for C# 3.0 we realized that the null type was bizarre. It is a type with only one value — or is it? Is the value of a null nullable int really the same as the value of a null string? And don’t values of nullable value type already have a type, namely, the nullable value type?  (The reader who critically notes that it is question-begging to ask whether values of a given type have a type ought to instead applaud my consistency. Tautologies are by definition consistent.) So already this is very confusing.

Worse, the null type is a type that Reflection knows nothing about; there’s a Type object associated with void which has no values at all, but none associated with the null type. It is a type that doesn’t have a proper name, is in no namespace, that GetType() never returns, that you can’t specify as the type of a local variable or field or method return type or anything.

In short, it really is a type that is there for completionists: it ensures that every compile-time expression can be said to have a type. Except that C# already had expressions that had no type: method groups in C# 1.0, anonymous methods in C# 2.0 and lambdas in C# 3.0 all also have no type. If all those things can have no type, clearly the null literal need not have a type either. Therefore we removed references to the useless “null type” in the C# 3.0 specification.

As an implementation detail, the Microsoft implementations of C# 1.0 through 5.0 all do have an internal object to represent the “null type”. They also have objects to represent the non-existing types of lambdas, anonymous methods and method groups. This implementation choice has a number of pros and cons. On the pro side, the compiler can ask for the type of any expression and get an answer. On the con side, it means that sometimes bugs in the type analysis that really ought to have crashed the compiler, and hence been found by testing early, instead cause semantic changes in programs! My favourite example of that is that it is possible in C# 2.0 to use the illegal expression null ?? null. A careful reading of the specification shows that this expression should fail to compile. But due to a bug, the compiler fails to flag it as an erroneous usage of the ?? operator, and goes on to infer that the type of this expression is the null type, even though that expression is not a null literal. That error then goes on to cause many other downstream bugs as the type analyzer tries to make sense of the expression.

In Roslyn we debated what to do about this; if I recall correctly the final decision was to make two APIs, one which asks “what is the type of this expression?”, and one which asks “what is the type of this expression given a certain context?”. In the first case, the null literal expression has no type and so null is returned; in the second, the type that the null literal is being converted to can be returned.

About these ads

38 thoughts on “What is the type of the null literal?

  1. This seems somewhat like something you blogged about before, with regard to “Literal Zero”, which is regarded differently from an integral-type constant whose value happens to be zero. Would I be correct in guessing that a Literal Zero also has no Type object associated with it, even though one can do things with a Literal Zero that cannot be done with any integral-type constant?

    • Good question. The situations are similar but not identical. The literal zero clearly has a type; it’s an int. However there are special rules that say that this particular expression has different conversion rules than the rules that normally apply to ints.

      The difference between conversions that are justified because of the type of an expression and conversion thats are justified because of some special lexical format of the expression is tricky and the spec has historically done a poor job of carefully noting the distinction. Mads and I made a lot of improvements in this area in the C# 3 and 4 specifications and my spies tell me that there are similar tweaks to the wording in the works for the C# 5 specification, which has yet to be released.

      • If C# had defined a family of types CompilerIntLiteralZero (along with perhaps CompilerUnsignedLiteralZero, CompilerLongLiteralZero, etc.) which had an implicit widening conversions integer types, but also sported a few overloaded operators (e.g. CILZ+CILZ yields CILZ, etc.) how would the semantics of those compare with the present rules? I guess one would have to do something about “var foo = 0;” (lest “foo” become a CILZ), but that might be handled by defining an “InferType()” attribute and having CILZ marked as inferring type “int” [such an attribute could also be useful in things like Fluent interfaces, where it should be possible to access any single member of a returned object, but not persist it]. Would there be any other semantic differences?

        As for the null literal, not only is it universally convertible to any reference type; it also is regarded by the compiler as a legitimate value for any nullable type. Personally, I’m not sure I like the way that nullable types pretend to be reference-comparable to null [I think it’s more confusing than “HasValue”], but it is what it is. I wonder if there were ever plans for a “Nullable” constraint that would have forced something to be either an object reference or a nullable struct. That would have been really sweet.

  2. What about those wondering why we have a null value in the first place? Was it considered too exotic to leave out?

    • C# was designed to be familiar to users of C and C++, and to interoperate cleanly with existing COM and Win32 API code; all of that suggests that null references are a reasonable feature.

  3. Strange enough, my VS 2012 has just successfully compiled the following code:
    class Program
    {
    static void Main(string[] args)
    {
    string s = null ?? null;
    }
    }

    • Well you can always target older versions of C# using new IDEs, so check what version the application is targeting, although my app is currently targeting 4.5 and I’m still getting this compiled. Roslyn however does not compile this statement.

      • This is not true. You cannot use older C# compilers in newer Visual Studios. You can only target different versions of **.NET**, which is a completely separate matter and has nothing to do with the C# language.

      • Well, you said “it is possible in C# 2.0 to use the illegal expression null ?? null”, which provoked the implicit “… but not in C# 3.0 and higher”: exception proves the rule. By the way, I wonder what was the logic behind specs not allowing `null ?? null`? Is it a part of some more general consideration?

        PS: I wish C# had non-nullable [class] references out of the box.

        • The problem is precisely that it is hard to figure out what the type of the expression should be. Suppose you have “string x = (null??null)” — ok, is that legal or not? Since the right hand side is not the literal null, we cannot use the rule that says that the literal null is convertible to string. So what is its type? If it is object then this assignment should be illegal because you can’t assign object to string. It’s a mess, and the expression is useless, so it should be illegal.

          • Why should the expression be illegal? Regard “null ?? anything” as equivalent to “anything”, and the expression becomes “string x = null;” which is perfectly legal. Not sure such a thing would be produced in hand-typed code, but it’s not implausible that it might be easier for a code generator to output such a thing in some scenarios than add code to replace it the expression with “null”.

  4. Well… I don’t like null, this “type” have some weirds behaviours, for example:
    string s = null;
    char[] c = null
    object f = s;
    object o = c;
    Console.WriteLine(f == o);//prints true

    Let the theory alone we (can I say we?) know this happens because in normal implementation null is represented by the zero address and so, 0 is equal to 0, but what if each type had it’s own null (or default so empty strings and arrays are respective default) and null compared equal only to nulls of the same type?

    • From the implementation perspective, you’d have to have a special “null reference” value that isn’t 0 for each type, which means that initialization of reference fields and locals becomes more tricky and less efficient (but this is probably irrelevant in practice).

      However, I think it is the way it is because C# was trying to have semantics reminiscent of C++, and having different null values for different types would be a big surprise for everyone who has worked with nullable references before.

      Plus, there’s always the question of how beneficial this is. Is it really that useful to have null references of the same type compare equal, but null references of different types to compare different? Is this really something that comes up in real code? Or is this just for pure aesthetics?

    • Two of the *fundamental* rules of the type system in .NET (and also Java, btw) are that the default value for *any* type which can be stored in an array or structure field (including structures of any complexity) will be all-bytes zero, and for any pair of references types T and U such that T:U, a cast from T to U is *always* representation-preserving. Since all reference types are derived from Object, this implies that even if class types T and V are unrelated, (Object)default(T) will be indistinguishable from (Object)default(V).

      Those two rules considerably simplify some aspects of the .NET runtime (and the JVM), since they mean that the only information necessary to initialize a type at runtime is a single number (its size), and compilers don’t have to generate any code for up-casts of reference types. Consequently, I expect them to apply to all future versions of both .NET and Java from now until the end of the universe.

      BTW, it might have been possible to change the behavior of String if that had been defined as a structure which contained a single field of type StringObject [which would behave as String does now] or perhaps Char[]. Had that been done, the default value of String could have behaved like an empty string (as it did in COM) rather than a null reference. The only ‘problem’ with such a scheme would be that passing strings as “object” would have added another layer of boxing unless the runtime special-cased such conversions (as it started doing in .NET 2.0 with Nullable[T]). If String were a value type as described, casting a default-valued instance it to Object could yield a reference to a zero-character string.

      • You just did what I have done several times in the past and were bite so much that I fear it now, you take implementation in account when discussing the logic of a feature, solves a lot of problems, none of them the “root problem”.

        What do you mean by “representation-preserving”? The value of the pointer doesn’t change? IIRC this inst true if U is an interface or if T is a value type, now, if by “representation-preserving” you means that the default value of T must always compare equal to V then I disagree (overload the operador if this is required for a particular case).

        Now, that there are interfaces that complicates what that “feature” would simplify, what’s the problem remaining?

        In the case of String, yes, it could be a struct and the runtime special case it (as it does for every value type), but the double indirection problem could also be solved in a different way, also, why structs exists to start with? As well as Eric is always telling us that mutable value types are a bad thing, on the other hand, any IEquatable non-nullable immutable reference type can be copied without chnaging the result, so why not let the run time choose when to handle today structs as value types or as reference types?

        BTW, I understand he argument that C# was designed as it was to be compatible with other languages, wich doesn’t means it didn’t inherit some bad features.

        • I specifically limited my discussion of representation-preserving casts to *reference* types, since every value-type definition actually describes two related but different kinds of things: a type of storage location and a type of heap object. A value-type heap object is an instance of System.Object, but a value-type storage location holds neither an instance of System.Object nor a reference to one. A widening conversion exists from the storage location type to a heap reference, and narrowing conversions exist from some heap reference types to the storage-locations type, but the heap type and storage location type effectively exist in different universes.

          As for the significance of “representation preserving”, the runtime assumes that if class type T derives from U, a variable of type T may be copied into one of type U simply by copying all the bits. This means that because for any class type T, default(T) must be all-bits zero and (Object)default(T) must also be all-bits zero, (Object)default(T) must equal default(Object).

          • Thanks for clarifying “representation preserving”, but how about interfaces? Or (If ever implemented in C#, but I guess some of the same reasons prevent it) multiple inheritance? While convenient (altough already there are exceptions) for the CLR representation preserving doesn’t stop from implementating many other features?

        • A major part of the reason C# does not support multiple inheritance of anything but interfaces is that doing so would either require that upcasts not be identity-preserving or that virtual methods behave oddly. Assume X:W, Y:W, and Z:X,Y. W has a virtual method Foo which is overridden by X and Y but not Z, and zz is of type Z. It would be odd for (W)(X)zz.Foo to invoke Y’s override, or for (W)(Y)zz.Foo to invoke X’s override, but one of those things would have to happen if (W)(X)zz and (W)(Y)zz are identical.

          Interfaces avoid this problem by requiring that every class which implements an interface provide its own implementation of all methods and properties defined therein (as well as its own backing fields for holding any state implied thereby). Note that if Z had provided its own override of Foo, there would have been no problem, since zz.Foo, (X)zz.Foo, (Y)zz.Foo, (W)(X)zz.Foo, (W)(Y)zz.Foo, (W)zz,Foo, and even (W)(Object)Foo, and (Z)(Object)Foo would all refer to the method defined in Z.

          • Let the identity-preserving problem rest for a while, the other reason for not supporting multiple-inheritance is the diamond problem? Ok, let’s say interfaces comes with a “default implementation” being like classes except they have no state, yes, an interface A may define a method X, interfaces B and C inherits from A and provides a default implementation for X, the interface D inherits from B and C, yes the diamond problem may happen, the C++ solution may result in some weird behaviour, but is this problem that common in real life? Because I know of a problem wich also causes weird behavior and is pretty common in current C# version, for example, in the Enumerable class there is the Skip extension method taking an IEnumerable as parameter wich works on a pretty inefficient way, if there was an efficient implementation of Skip in IList (or any custom list) and a method wich take an IEnumerable as parameter and calls the Skip method would call the inefficient version since no dynamic dispatch is possible in this case.

            I guess this spefici problem can be solved by other means other than interfaces with implementations or multiple inheritance but those other means are even worse.

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1407

  6. Hi Eric. While reading this entry I initially violently disagreed with you, because in my thinking it wasn’t possible to have a value without a type. Then I discussed this with a friend and he opened up a new way of looking at it to me: namely, the compiler turns the expression (whether it be a null literal or a lambda expression or anything else that you say has “no type”) into a value that exists in the universe of the program that is being compiled. Before this step, there is no such value; the running program has no concept of the lambda expression before this conversion. Therefore, I think I now understand what you’re saying and I think the term “conversion” is misleading. The “conversion” that you refer to here is no different from turning the parse-tree node “5” into a value of type “int” (which is not called a “conversion”), and it is very different from the operation denoted by the code “(object) value” (which *is* called a “conversion”, and it does not matter whether it’s a boxing conversion, a reference conversion, or any other conversion). Therefore, I think calling this a “conversion” is a category error. You are not converting from anything that exists in the program’s universe.

    • I take your point, but the compile time analysis is by its nature often about things that have no existence at runtime. When you say class C<T> { T t; } the type of t at compile time is T, but it surely will not be “T” at runtime; there is no such beast. The compile-time type system is a proof system; it’s a bunch of logical deductions that are made according to formal rules. The run-time type system is a tag on each object. Obviously they are designed to be strongly related to one another, but they are not identical by any means.

  7. Eric, you say there are downstream bugs resulting from the null ?? null one, and yet the null ?? null still works. Does this mean that the downstream bugs are also still there? If so, what are they? Would be interesting to know.

    • The bugs we knew about we fixed. I don’t recall the exact details but it was something like “var x = null ?? null;” would infer x to have the null type, and then crash during code generation. That wasn’t the bug, but it was something like that, where type inference would infer “the null type” as the type of something, and then things would go bad from there.

      • What would you think of the idea of allowing types to indicate (via attribute) that the compiler should exclude them from type inference (perhaps substituting some other type)? It would seem like a simple thing to implement, with a pretty big payoff both for internal situations you describe, but also many other situations as well. There are a number of situations in which “Foo.This.Bar()” would make sense, but either the return value of “Foo.This” shouldn’t be used for any purpose except for one member-access operation which should be performed before anything else, or else the Bar() method could be simplified if it knew that it was acting upon an object to which no other reference existed anywhere in the universe (a situation which would apply in “Foo.This.Bar()” but would not apply if the value of “Foo.This” were persisted).

  8. Pingback: F# Weekly #30 2013 | Sergey Tihon's Blog

  9. You didn’t explicitly say it, but Roslyn seems to disallow var a = null ?? null from compiling (without a cast on one of the operands). Tested using the C# Interactive tool.

      • I also wrote same In VS 2012 IDE as var x = null; and var x = null ?? null; . Both shown red underline below x. An error message was “Can not assign value to implicitly-typed local variable. Is it causing to create type of by assigning null value to x? Or is there any other thing which is happening?

  10. Pingback: NULL: Direnmenin Otopsisi | Senselogi©

  11. Pingback: Building a C# compiler in F# | Neil Danson's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s