ATBG: non-nullable reference types

Posted on November 20, 2013 by ericlippert

Today on the Coverity Development Testing Blog‘s continuing series Ask The Bug Guys, a question I’ve gotten many times in person over the years but never blogged about: just how difficult would it be to retrofit non-nullability into the .NET type system? Pretty hard, it turns out.

As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

53 thoughts on “ATBG: non-nullable reference types”

Alex Godofsky on November 20, 2013 at 10:25 am said:

I am thoroughly unconvinced by Objection #2. We know that its problems are solvable, because C# already contains non-nullable types, i.e. value types.

Reply ↓
- Eric Lippert on November 20, 2013 at 11:31 am said:
  
  Value types by design have the property that the “zero” of a value type is always a legal instance of the type. The whole point of non-nullable reference types is to deny that principle, so saying “it works for value types” is a non-starter. It works for value types precisely because the “zero” of a value type does not have the semantics of null.
  
  Reply ↓
  - Alex Godofsky on November 20, 2013 at 11:34 am said:
    
    The integer zero is always a legal integer, but that doesn’t mean it’s always semantically valid as a field of an object.
    
    Reply ↓
- Eric Lippert on November 20, 2013 at 11:34 am said:
  
  The equivalent problem in value types is to invent a new type int! which has all the values of int, is initialized to zero by the memory allocator, but is impossible to *observe* to be zero. That problem is NOT solved today, and that’s the problem that needs to be solved to make non-nullable reference types work.
  
  Reply ↓
  - Roman on November 21, 2013 at 3:08 am said:
    
    I personally would be entirely satisfied by a solution that doesn’t do this really complicated things and instead does the same for “string!” as is currently done for “int”. Namely, define “default(string!)” to be equal to the empty string, and require the CLR to initialize fields accordingly.
    
    Granted, this is not free; one must follow up the memset with a few actual initializations. But it’s already done for structs (right?) and I think this is exactly what we want for “string!” and non-nullable classes. It’s not as cool as what you’re discussing, but it gets us 90% of the benefit for 10% of the effort…
    
    Reply ↓
    - Joshua Lawrence on November 21, 2013 at 12:32 pm said:
      
      Yeah, but instead of null string bugs, won’t you just get empty string bugs?
      
      And therefore you put checks all over the place that throw on an empty string. Or at least, that’s all I can think of to do. Which might as well just be called NRE for all the different it makes?
      
      Reply ↓
      - Alex Godofsky on November 21, 2013 at 2:59 pm said:
        
        Again, does this cause a problem for ints? Do you need to throw checks all over the place to make sure they aren’t 0?
      - Joshua Lawrence on November 24, 2013 at 12:34 pm said:
        
        If you need a sentinel to flag that a field doesn’t have a valid value, back in the day it was common to see -1 or something used. If that is your approach, then yes, you do need to check for it all over the place.
        
        These days in C#, I use int? if I have a need for it. On those occasions, I do have to check for null.
        
        I’m not saying this is the most common use of integers, a lot fo the time 0 may be a perfactly valid value for it to have, and anyway, flow analysis is good at preventing unintentional use of uninitialised variables. I am saying that a need for a sentinel value of some sort does come regularly enough, and a language would be hampered without it.
        
        And, specifically, that whether it is a large or a small problem for ints, Roman’s suggestion is no more convenient or less prone to error than what we have now.
Sam on November 20, 2013 at 10:37 am said:

Objection 1 seems a little contrived- obviously the BCL couldn’t be changed wholesale. A feature doesn’t need to be used in the BCL to be enormously valuable. Named and optional parameters being the perfect example; they’re not used anywhere in the BCL, but provide a huge amount of self-documentation and clarity in user code.

I agree with Alex on Objection 2 as well. If it already has been done in Spec#, then how did they do it? Most features in C# (or any language for that matter), have corner cases where they break down. A great C# example of this is readonly keyword. As we know, a readonly field can actually be modified via reflection. This objection strikes me as not letting a chef use a knife because they might cut themselves. You have to understand and be careful with your tools, regardless of the context.

Reply ↓
EduardoS on November 20, 2013 at 11:02 am said:

About objection 1, I think C# already made worse changes, change the BCL, make the conversion implicit, check for null and throw an error, that’s na automated chaneg, find “argumentoNullException” and replace by a “!” in that argument, far from ideal but a helpful first step.

And about objetion 2, I can imagine two solutions:
1) A default instance for reference types, I know, some will try to kill me just by sugesting this, a bit hard to implemente, but, it is na option.

2) “UseOfUninitializedNonNullableReferenceException”, maybe the name needs some work, but helpful for being more informative than the generic “ArgumentNullException” or “NullReferenceException”, wich clearly means I forgot to initialize the field instead of assigning a null reference to it anywhere in the program.

Reply ↓
- Eric Lippert on November 20, 2013 at 11:36 am said:
  
  OK, suppose you have a Foo! which contains a field of type Bar! and oh by the way, Bar! contains a field of type Foo!. Describe the sequence of allocations by which the default instance of Foo! is created, where the default instance of Foo! refers to the default instance of Bar!, and the default instance of Bar! refers to the default instance of Foo!.
  
  This might sound silly, but the compiler team has to consider what to do in all cases, not just the sensible ones.
  
  Reply ↓
  - EduardoS on November 20, 2013 at 8:54 pm said:
    
    You have an incredible ability to think in corner cases!
    
    Yes, that’s not possible to cover all corner cases, this case could just be disallowed or allow with lazy initialization of the default value, wich could lead to deadlock (or an exception if the condition is detected), in the case of disallowing I can think of some use cases that would be not allowed to use the non-nullable semantics, but in the case of lazy evalution I can’t think of any valid use case.
    
    Reply ↓
  - John Payson on November 21, 2013 at 9:12 am said:
    
    If objects could declare pre-init methods which were verifiably forbidden from exposing a reference to the object under construction, and forbidden from doing *anything* with certain parameters other than storing them in that object, it should be possible to safely construct objects such as you describe without deadlock and without leaking partially-constructed objects. The system would generate blank instances of `Foo` and `Bar` and then call the pre-init method for each, passing it a reference to the initially-blank instance of the other. By the time a reference to either object would be exposed to any code which could examine the non-nullable fields, they would be initialized.
    
    Incidentally, this sort of thing hints at another thing which could be useful: a means of declaring storage locations as ephemeral or returnable. The system would with one exception forbid any reference stored in a returnable storage locations from being copied to any storage location which was not ephemeral or returnable, and would forbid references stored in ephemeral locations from being copied anywhere that wasn’t ephemeral, except as returnable arguments to a method call. The storage class of a the return value from a method call would constrained to the most restrictive of its arguments. Many bugs in .NET code result from passing references to mutable objects to code which is expected merely to copy data from those objects, but instead persists the reference (perhaps so it can process the data later). Being able to declare parameters as ephemeral would prevent such bugs.
    
    Reply ↓
- Joshua Lawrence on November 20, 2013 at 4:43 pm said:
  
  I’d argue that we have those now – the default instance of a reference type is null, and the exception you want is NullReferenceException.
  
  Reply ↓
  - EduardoS on November 20, 2013 at 8:47 pm said:
    
    Not, it is not the same:
    void Foo(string s)
    {
    string x = s;//No exception if s is null
    string! y = s;;//conversion exception is s is null
    }
    void Bar(string! s)
    {
    s = null;//compile error
    s = “”;//Works fine
    }
    static string! f;
    static void main()
    {
    Foo(“123”);
    Bar(f);//non initialized exception
    }
    
    I am using the string class because it is the same eric used in his post but I think the string (na array, for that matter) class was made wrong from begining, should be a value type with the default value meaning an empty string, null string only with the “?” operator expliciting calling for it, pretty much impossible to fix it now, non-nullable may be a workaround.
    
    Reply ↓
    - John Payson on November 21, 2013 at 9:38 am said:
      
      If string were a value type (encapsulating a field of reference type), either its boxing behavior would have to differ from that of any other type, or every conversion from `String` to a reference type would require an extra boxing step. Personally, I would have liked to see Framework support for structure types other than Nullable(Of T) to have custom boxing methods; if it did so, there could be two compile-time string types (nullable and not), the default value of the non-nullable string type would be an empty string, and that of the nullable type would be null.
      
      Reply ↓
    - Joshua Lawrence on November 21, 2013 at 1:02 pm said:
      
      In the general case, I disagree with your comment on this line:
      
      s = “”;//Works fine
      
      Empty string won’t usually work fine, it just won’t throw an NRE. If the program semantically requires the string to have a non-empty value, like for instance it’s a file name on disk or something, then you have to give it a real value at some later point.
      
      If you forget to do that, your program has a bug that will manifest itself when you try to read the file with that name.
      
      This is exactly the situation you’re in now with a null string, so you’ve changed nothing.
      
      Now if it is the case that an empty string is semantically valid in your program, like it’s the middle name field for a person class, and not everyone has a middle name, then I agree that initialising it to empty string in the first place may be the right thing to do, in that case. But you can do that already.
      
      Reply ↓
      - EduardoS on November 21, 2013 at 5:13 pm said:
        
        Null and empty strings are two special cases, as if one wasn’t just bad enough, and, to make things worse, they behaviour differ a lot and they return false when compared for equality, but null is the worst, the empty string have a, let’s say, “linear behaviour”, an empty string length will return 0, a null will throw a NRE, the empty string works for most string functions, the null string fails for most of them, the empty string is for the string what the “0” is for the int, “0” is ilegal in some cases too, but few and obvious enough to not be problematic.
        
        Yes, there are cases where the empty strings are ilegal, there are cases where the “!” string is ilegal as well, but there are many cases where the empty string is legal, in fact, there are many more cases where the empty string is legal than that null is a legal value and the emty string will have a predictable behaviour with most string fuctions, let’s not stop there, cases where I remember (maybe, a couple in the library) both empty string and the null are legal those special cases in fact represent a special predefined string, a case of the magic number anti-pattern applied in the string class saving the library programmer from adding two const fields in the class.
Nick on November 20, 2013 at 11:51 am said:

Objection 1 sounds a lot like some of the const pains you can run into in C++. However, while you’d need to have non-nullable references all the way down, it does seem like that’s something that could be built up over time. A new system could use non-nullable references as much as possible, with some kind of “string to string! upcast” as needed. It wouldn’t solve the problem, but it seems like it might be very useful.

> Is there also an implicit reference conversion from string to string! ?

This made me laugh — perhaps we need an additional syntax: “string!?” With a nullable non-nullable reference, Objection 1 should be solved! 🙂

Reply ↓
- Kyle Szklenski on November 21, 2013 at 7:59 am said:
  
  I like it! The “interrobang” operator. I actually thought of something similar: Say you had a nullable type that you wanted to ensure was non-nullable. It should be a legal statement, I guess, to say: int?!, but what does it even mean? Is ! the logical negation of ??
  
  Lots of bizarreness when you consider the ! all over a codebase.
  
  Reply ↓
Michael Hunter on November 20, 2013 at 12:59 pm said:

Hmm. About the initialization corner cases in Objection #2, I think a similar problem is already solved in C#. The problem seems roughly equivalent to the fact that there is a gap between allocating the memory for an object, and initializing the memory to zero (and setting up its vtable pointer and SyncBlock and whatever else) to give it a valid initial state for its type. If the thread is aborted during this process then the destructor must not run since the blob is not yet a valid CLR object.

The same could perhaps be done for non-nullable field initialization. The evaluation of field initializer expressions could conceptually occur before the object exists as a fully-fledged CLR object (perhaps even before memory allocation, although not necessarily). Once all the initializer expressions are evaluated successfully and their values assigned to the relevant positions in the object, then the object is alive and ready to execute its first constructor.

For convenience I could imagine a C++-like syntax for being able initialize fields using constructor parameters.

class Foo
{
object! obj1;
object! obj2 = new object();

Foo(object! someValue) : obj1 = someFunction(someValue)
{

}
}

In this example, the expressions `someFunction(someValue)` and `new object()` are calculated before `this` exists, and so if they fail then there is no problem with destructors or anything. They can’t access `this` or `base` so there are no ordering problems with an inheritance hierarchy.

Regarding Objection #1, I would almost definitely go with having two versions of each relevant function, similar to the philosophy with `async` (except in most cases I think the name of the function can remain the same and the compiler can disambiguate based on whether the arguments are nullable). This library code could perhaps even be generated automatically: some static analysis on the existing library code might show whether a particular function would be able to return a more specific result (non-nullable result) if a copy of the function were made with more specific arguments (non-nullable arguments). If `T!` is treated as a more specific type than `T`, then covariance will ensure backwards compatibility for cases where none of the arguments are reference types but where the implementation is proven to never return null (eg `Int32.ToString()` can be statically proven to never return null and so can be modified seamlessly to return `string!` instead of `string`).

Reply ↓
- Joshua Lawrence on November 20, 2013 at 5:07 pm said:
  
  I like both of these.
  
  Reply ↓
- John Payson on November 20, 2013 at 7:33 pm said:
  
  The inability to use constructor parameters in field initializations is IMHO a major and needless design weakness in C# and VB.NET, though it can be worked around more easily in VB.NET [C# runs field initializers early, while VB.NET runs them late; ideally a language should allow the programmer to specify which initializers should run before the base class and which ones after, since each style is advantageous in different cases]. Otherwise, the “problem” of types holding default instances of each other could be solved by recognizing a category of field which may not be legally read until after a constructor completes. If instances of two types are each supposed to hold a reference to a default instance of the other, the system would generate a blank instance of each type, store the references into those fields, and then run the constructor for each.
  
  I think my biggest complaint with the idea of allowing the declaration of “non-nullable” variables is that since nullity can easily be detected at run-time, the only problems they’d alleviate are those which would likely be caught anyway. There are many other statically-verifiable characteristics variables could have which would be much more useful. For example, many programs rely for correctness upon the fact that a a field of some class holds the only reference that will ever exist to some object outside the execution context of the class’s methods. They may also rely upon the fact that certain objects of mutable types will never be exposed to things that might mutate them. It would be helpful if method parameters, variables, and return types could be decorated with attributes that would allow such invariants to be validated.
  
  Reply ↓
- voo on November 21, 2013 at 5:48 pm said:
  
  In JVMs the problem you mention is generally solved by the GC zeroing out memory before it can be used by the application.
  
  That approach is nicely simple, efficient (you can zero out a large memory range at once and not small objects piecewise) and avoids problems with thread-safety (no worries about reorderings or necessary barriers).
  
  That’s also the reason why the default value has to be all 0s for any type (well some constant value).
  
  Now the CLR may do things differently, but I wouldn’t count on it.
  
  Reply ↓
Joshua Lawrence on November 20, 2013 at 5:05 pm said:

I dunno about the value of the whole enterprise, actually.

Programmers often use sentinel values for a variety of purposes, sometimes maybe they’re overused but I think sometimes not. You want to make a call, but leave an argument unspecified so that a well-known default will be used. You have a GUI form that is only half-filled-out by the user, so you need the properties of the object that backs the form to be marked as uninitialised.

People use null for this sentinel value now, but if it was taken away, that just means they would use something else, MyCall.MyArgument.Default or FormBean.AddressField.InvalidValue or something, but the logic would still be the same and thus the bugs would be the same.

In the worst case, programmers would just pick an existing valid value of the type to use as the sentinel, a string called “!invalid value” or an int of -1 or something, and then forget to check for the sentinel and cause a bug when the user actually enters -1 or the “invalid” value is used without being checked.

I guess the best way would be something like Nullable, where the type is decorated with a boolean and an exception is thrown if the boolean is not checked. That would at least help programmers not to forget, but does it really buy you much over null and NRE?

I don’t disagree with the assessment of the cost of null pointers over the years, but I blame the lack of modern compiler tools – now we have an NRE with a readable stack trace, we have annotations and exceptions to make null checking easy. The cost of screwing up is less than it was.

I guess I could see a place for weaker and simpler version of the ! type decoration. Maybe something that only works on method parameters, and just does a runtime check during a method call, so the programmer doesn’t have to write if(foo==null) { throw something; } by hand at the start of the method. It could even be the default, and ? used as now to mean the nullable version of a reference type, although that would obviously be a big breaking change. It would be a lot like a C++ reference, I guess.

I think, in a modern language with modern support tools, the cost of NREs is no greater than the cost of any alternative I can think of.

Reply ↓
- John Payson on November 20, 2013 at 6:54 pm said:
  
  The biggest problems with null pointers stem, I think, not from Hoare’s “mistake”, but rather from the general reluctance of C compiler writers to validate pointers which are being either dereferenced or indexed. Many systems avoid putting anything at address zero, so trying to dereference a byte at address zero will be pretty harmless. The problem is that if one starts with a null pointer and advances it by some amount, it won’t be a valid pointer *but it won’t be recognizable as null either*. That problem has caused far more grief than would have null pointers that were immediately flagged as such.
  
  Reply ↓
  - Joshua Lawrence on November 21, 2013 at 1:11 pm said:
    
    Hell yes. I don’t think I’ve ever written pointer arithmetic, and I did years of C/C++. It’s only necessary in some special niche cases in low-level programming, IMHO. It ought to be rare even if you’re writing a device driver. Really, I think pointer arithmetic is for people writing kernels.
    
    I don’t think it’s a sensible feature for an application programming language, or libraries in support of applications.
    
    Those rare occasions where it might bring something to writing an application – I can’t think of a case, but I imagine someone out there could – must be weighed against the numerous times someone displays bad judgement in using the feature just because they can.
    
    Reply ↓
    - John Payson on November 23, 2013 at 9:27 am said:
      
      One of the major design goals of C was to make program development practical on computers with relatively modest amounts of memory. Early versions of the language had an extremely simple type system which didn’t really “understand” arrays or structure types; instead, pointer arithmetic lay at the heart of everything. Although C allowed one to write code that looked like array indexing, x[y] has never really been anything other than a shorthand for (*((x)+(y))), so many constructs like -5[ptr] which wouldn’t make sense with arrays were and remain valid.
      
      It’s interesting to contract C with Pascal in some ways; although “official” Pascal has never really been suitable for systems programming, the addition of a few extensions will improve it immensely. Turbo Pascal was developed by Borland Software to run on either an 8088-based PC or a Z80-based CP/M system, using an amazingly small memory footprint; on a 48K machine, it’s possible to edit and run a program entirely in RAM–something C compilers were never designed to do. It thus allowed a faster pace of program development on machines without fast disk storage than was possible in C.
      
      It’s too bad that Pascal lost its footing, since many aspects of its syntax are clearer than C, and more significantly leave little doubt of the programmer’s intent. For example, it uses distinct operators for “divide two Integers, yielding an Integer result” and “divide two numbers, yielding a Real result”. To clearly express intent, the C statements equivalent to “Real1 := Int1 div Int2; Real2 := Int3 / Int4;” would both require typecasts: “double1 = (int)(int1/int2); double2 = (double)int3/int4;”. Even though the typecast in the first statement isn’t required by the language, in its absence a reader wouldn’t know whether the truncation of the result was accidental or intentional.
      
      In any case, pointer arithmetic is an indelible part of the C language as a result of its roots as a language which should be cheap to compile. The existence of Turbo Pascal, which was for many years a dominant development platform on the PC (Tetris was first written in Turbo Pascal, btw) suggests that a less-stripped language could still have been practical, but I don’t think K&R had any intention of designing a language which would live on for decades, or whose syntax would spawn imitators (among them Java and C#). If they’d had such intention, I suspect they would have done things rather differently.
      
      Reply ↓
      - Joshua Lawrence on November 24, 2013 at 12:44 pm said:
        
        Yeah, I realise all the array code I wrote was pointer arithmetic under the hood, but abstractions matter. At least for clear programming.
        
        I don’t know how K&R thought about the longevity of their language, but they were pushing the bounds of systems programming at the time. There are always loose threads when you’re innovating.
        
        For example, I do a LINQ Select to transform the thing being enumerated on from one form to another several time a day. I don’t think I’ve ever used it in a way analogous to a SQL SELECT. I think it’s totally misnamed, but there you go.
      - EduardoS on November 24, 2013 at 4:31 pm said:
        
        misnamed and way too verbose, I would rather prefer an operator like “->” for this pretty common operation, so instead of:
        from x in collectionOfInts
        select x.ToString()
        just:
        collectionOfInts->ToString()
      - Peter Hagervall on November 28, 2013 at 12:56 am said:
        
        Regarding LINQ Select, in what way do you suggest it’s misnamed?
        It fills exactly the same function as in SQL, that is, to perform a projection of the source data to whatever representation you prefer in the result.
        Am I missing something?
      - Eric Lippert on November 28, 2013 at 7:06 am said:
        
        The point is that “select” suggests that you are selecting one or more columns from a table. Which of course “select” can do. But as you correctly note, you can do so much more with “select” than merely select a column from a table; you can perform an arbitrary transformation on any data whatsoever. I think these guys are exaggerating how terrible it is. It’s not a bad name. The name allows newcomers to the syntax to understand and write the most common query, which is to select a column from a table. You can then learn later that “selection” is actually the more general operation of “projection”.
John Payson on November 20, 2013 at 6:45 pm said:

For class types which are used to encapsulate values, defining a method/property attribute which will specify that the compiler should generate a non-virtual call for the indicated method or property would achieve much of the advantage of having a meaningful default value, with minimal effect on anything else. The only major problems would be with `ToString()` and `Equals(Object)`; that difficulty could be handled by defining a marker interface for types that should behave as non-nullable, and having the compiler replace calls to Equals on a `T` which is constrained to that interface with a call to `SafeEquals(T obj, Object other);` [and do likewise with ToString] If `obj` is not null, invoke its `equals` override; otherwise dispatch to a helper method which uses a non-virtual call to `T.Equals(Object)`.

An alternative would be to add virtual members to ValueType for boxing and unboxing structures. Using this approach, one could define non-nullable wrapper types for class types which would encapsulate a reference to the underlying class type. Boxing a structure of such a type could yield either the encapsulated reference (if non-null) or a reference to a default instance (controlled by the structure’s override of the boxing method). Such a facility would also make it possible for a type like `System.Double` to support things like `Double d = (Double)(Object)4;`. If Double’s unboxing override was coded to look for type `Int32`, it could recognize such a value and convert to Double automatically.

Reply ↓
Sinix on November 21, 2013 at 2:26 am said:

Hello!

What if non-nullables will be optional compile-time check, not strong runtime contract? There is similar feature already – the /checked compiler switch.

If we do so, we could leave clr untouched, the only thing we have to patch is BCL (we have to add [NotNull] attributes on all non-nullable parameters/return values) and compiler (adding syntactic sugar handling “string!” as “[NotNull] string”, automatic annotations based on control flow analysis and explicit checks when casting from nullables to non-nullables).
If we DO NOT specify compile-time option “no nullables”, all client code just ignores [not null] annotations – no warnings, errors etc, so there will be no breaking changes.

If we turn checks ON, the annotations should be checked at compile-time and so we get the new shiny nullless code.

Now the most interesting part (here and below we assuming the checks turned ON): the annotations should be deducted implicitly and automatically.

As an example, return of the
—
public string SayHello() { return “Hello!”; }
—

should be annotated as [NotNull] so we can safely do something alike
—
var myVar = SayHello().Length;
—

now, if SayHello() will be rewritten as
—
public string SayHello() { return null; }
—

compiler should emit error at SayHello().Length. Of course, we can explicitly annotate SayHello as
—
public string! SayHello() { return null; } // or public [NotNull] string SayHello() …
—

and got compile-time error right there without need to fix all places where SayHello() is called. Complex cases and casts from nullable to non-nullable should be explicit:
—
public string! SayHello() { return (string!)GetNullableString(); }
—

Compiler should inject explicit check here so NRE will be thrown at the return of the method if GetNullableString() returns null.
If we follow these restrictions the flow analysis could be restricted by method boundary.

And the last thing: we already have at least two working implementations, the jetbrains resharper annotations and not-null checks from CodeContracts.
CodeContracts is especially interesting: static verifier does the control flow, deduces null/not null values and allows to supply the annotations together with the assembly. And, of course fires errors on contract failure:)
Only thing we missing is syntactic sugar part.

So, is this really hard to bring [NotNull] into c#?

Reply ↓
Roman on November 21, 2013 at 3:17 am said:

I think the biggest problem with nullable references is that a reference is nullable “by default”. Similarly, a column in SQL is nullable if you say nothing about its nullability.

This is wrong! Nullable should be explicitly requested. In a hypothetical “let’s go back in time” language, “string” and “Foo” should be non-nullable references, and using “Foo” should require the class to define a default constructor. The nullable references then become “string?” and “Foo?”.

Objection #1 basically explains why we all end up having to migrate to an entirely new platform every decade. It’s impossible to extend the old one indefinitely, at least not without severe breaking changes that are comparable to migrating to a new platform.

Objection #3 is typically valid for every major wart of a platform individually, once the platform is many years old. Yet taken together, these warts add up, and a decade later we _need_ to replace the whole lot with something new and “fixed”.

In the case of .NET and C#, I’d argue that they’ve got an incredible number of subtleties exactly right, so the best way to go forward would be to absolutely avoid starting from scratch, and instead make major and incompatible changes to the existing framework/language, remove cruft and basically create a .NET-reborn. Incompatible, yet similar. This needs to be done. Maybe not just yet, but in another 5 years for sure.

Reply ↓
- John Payson on November 21, 2013 at 9:44 am said:
  
  It’s generally impossible to have arrays of any type which does not have a default value, unless one requires that *all* elements of an array be written before *any* can be read. While one could have an “array of non-nullable items” type whose constructor accepts a generating function, requiring that code generate some useless but seemingly-valid values to go in an array before it can know what useful values should go there is apt to not only reduce performance, but also cause obscure bugs (if code reads something from an array that isn’t supposed to be null, but is, odds are very good that the problem will be detected; if instead it read something that wasn’t null but was simply wrong, the likelihood of the problem going undetected would increase).
  
  Reply ↓
  - Joshua Lawrence on November 21, 2013 at 1:35 pm said:
    
    if code reads something from an array that isn’t supposed to be null, but is, odds are very good that the problem will be detected; if instead it read something that wasn’t null but was simply wrong, the likelihood of the problem going undetected would increase
    
    Hear hear.
    
    Reply ↓
  - Martijn Hoekstra on November 22, 2013 at 3:24 am said:
    
    A solution then is not to allow an array of non-nullable types, but for arrays always to have a nullable type member that requires explicit dereferencing like a maybe type. This isn’t as cumbersome as it may sound, if there is auto-boxing for assignment. If you’re using arrays you’re probable close to the metal, and always explicitly checking null before delivering back a higher abstraction, if you like the array accesors, maybe create an Indexed[T] interface exposing only T this[Int] { get; set;} and return something that implements that, which may either return some sort of maybe type, or might throw on an underlying null
    
    Reply ↓
    - John Payson on November 22, 2013 at 2:31 pm said:
      
      As you node, code using arrays is often close to the metal. The usefulness of a GenericCollection(Of T) would be degraded if T couldn’t be a non-nullable type, but allowing T to be a non-nullable type would require that any backing store `T[]` be either a nun-nullable type or a “maybe type”, or else would require a means of declaring an array of type “T unless it’s a non-nullable reference type, in which case use a nullable reference”, and having the compiler generate run-time validation code in the latter scenario when elements are read. The latter approach might not be too bad, but normal “maybe type” implementations would dog performance of code which is likely using arrays because it’s supposed to be fast.
      
      Reply ↓
- Thomas on December 7, 2013 at 5:19 pm said:
  
  “This is wrong! Nullable should be explicitly requested. In a hypothetical “let’s go back in time” language, “string” and “Foo” should be non-nullable references, and using “Foo” should require the class to define a default constructor. The nullable references then become “string?” and “Foo?”.'”
  
  I fully agree with Roman, and it is IMHO a major design flaw, that strings can be nullable (and are a reference types and not a value types)
  string is an bag/container/enumerable of chars and by the rule every bag/container/enumerable should be not empty.
  When I ask a char container (string), how many chars do you have, I not expect to get an NRE. I worked with a lot languages with handle strings as empty strings an I felt more comfortable with that, than in C#. E.g. When I ask a database field, how many chars do you have, they say 0, not NRE, because the work data centric.
  
  Reply ↓
Sinix on November 21, 2013 at 4:54 am said:

P.S. Not-null checking is not “all-or-nothing” thing.

The main reason NRE differs from any other common exceptions is that they are hard to find and reproduce.
Null reference poisons the code: you’ve got null in one place but it fails unpredictable much later in another one.

Compile-time verification helps prevent this type of errors. Yes, it does not _guarantee_ that all is ok and it has false positives as almost all checks do. Even strong typing does not guarantee there will be no InvalidCastExceptions, but, at least, it localizes the place where things may fall.

Reply ↓
Marvin Steakley on November 21, 2013 at 1:58 pm said:

I really wish the NullReferenceException had a custom field with the runtime type of the instance that turned out to be null. For example, if an attempt to access the Length property of a null string reference threw, the NullReferenceException would contain a property set to typeof(System.String). Doable, or more trouble than it’s worth?

Reply ↓
- Joshua Lawrence on November 21, 2013 at 4:30 pm said:
  
  You can find out the compile-time type of the reference through static analysis, but the only way I could imagine to find the run-time type would be to ask it.
  
  Tricky to do with the null reference. And by tricky I mean logically impossible.
  
  The compile-time type would be useful debugging info though.
  
  Reply ↓
  - John Payson on November 22, 2013 at 2:24 pm said:
    
    The address at which an NRE occurs could, when compared with addresses in the debug information and other type-related metadata, should make it possible to determine the compile time type of the thing being dereferenced, any applicable generic type parameters, and the particular member being accessed. The “run-time” type would be `null` of course, but that the other information would have usefulness far beyond “Object reference not set to instance of object”.
    
    Reply ↓
voo on November 21, 2013 at 5:55 pm said:

Your objection #2 is obviously extremely hard to overcome in the given constraints, but I can’t really follow #1:

Clearly Foo! is a subtype of Foo, so the usual co/contravariance rules apply. From those it follows that the BCL couldn’t take a Foo! parameter without introducing a breaking change, but it’d be no problem to return a Foo! instead of Foo.

While the BCL would still have to do all its input checks, but it’d be a major improvement for the users and perfectly backwards compatible. This would work fine after all (assuming covariant return types, those certainly wouldn’t be impossible to implement – Java had them since v5):
string! x = “lala”;
string! y = SomeBclFuncTakingString(x);

Yes library writers that want backcomp would be limited in what they can do, but users could adopt and interoperate non-nullable and nullable types just fine!

Reply ↓
Martijn Hoekstra on November 22, 2013 at 1:25 am said:

In the past you have said that you consider having null in the language in the first place was a good idea for COM interop. My 20/20 hindsight tends to disagree with that notion, and I consider it a (completely reasonable, especially at the time) mistake; on first glance it seems that it is feasible to wrap/unwrap all interop with a Maybe/Option/Nullable datatype. Are there problems with that approach I might be overlooking? Do you still feel it was the right choice at the time?

Reply ↓
Ariel Ben-Yehuda on November 22, 2013 at 8:32 am said:

I would think that adding support for true initializers and allowing non-nullable fields only on locals and fields with objects that have true initializers.

[An object has a true initializer if it lacks a finalizer, its constructor does not refer to $this$ except for initializing fields (this includes calling non-static methods) except after all fields were initialized and the type is sealed, and its superclass also has a true initializer]

NB:
However, it seems that we may get some weird problems with reconstruction-via-refs. Seems like we would need a `strict’ attribute for classes that forces initializers to be true and disallow constructing into ref parameters of it.

Reply ↓
- John Payson on November 22, 2013 at 2:39 pm said:
  
  The concept of a “true initializer” sounds something like what I called a “pre-init” method, except that the latter concept if properly supported could allow for the construction of mutually-referential immutable class objects [each object’s pre-init method would be given a reference to the other object, but would be unable to store that reference anywhere that could be seen before its own reference was exposed, and none of the objects’ references would be exposed to anything other than pre-init methods until all such methods had been run]. The biggest difficulty I see with my concept or yours is that for it to really be workable there would have to be a means of designating method parameters as “ephemeral”. Otherwise an inability to use even methods like `Object.ReferenceEquals` on the objects under construction would make many potentially-useful concepts unworkable.
  
  Reply ↓
  - arielb1 on December 1, 2013 at 3:34 am said:
    
    Why would one like to use ReferenceEquals on an object under construction? The only reference to the object should be $this$ in the initializer.
    
    I’m not sure that putting mutual-recursion in an initializer is a smart thing to do (note you can still do it after field initialization) – generally mutual-recursion is added during object lifetime rather then during initialization.
    
    Reply ↓
    - John Payson on December 2, 2013 at 2:49 pm said:
      
      I was giving ReferenceEquals as an example of a pre-existing method which most people would recognize as “harmless”. As for why one would construct recursive data structures, consider the state necessary to iternate nodes in an immutable binary tree where each node has a `parent` node, versus the state necessary to iterate nodes without such a link. In the former structure, a single node reference will suffice to encapsulate the iteration state. In the latter, efficient enumeration requires having a mutable array that identifies all the nodes between the root and the node being visited. Taking a snapshot of the former enumeration state merely requires taking a snapshot of the reference; by contrast, the latter enumeration state requires a mutable entity, and taking a snapshot of its state thus requires constructing a new entity.
      
      Reply ↓
Pingback: C# Wishlist: Non-nullable Reference Types | devioblog
Matt on August 14, 2018 at 2:32 am said:

Dead link please fix

Reply ↓
sjb-sjb on February 17, 2021 at 11:51 am said:

Agreeing with Matt, the link to the post is dead, can you repost?

Reply ↓