ATBG: Warnings vs errors

NOTE FROM 2021: This article was originally posted on the Coverity Development Testing blog, as part of the “Ask The Bug Guys” series. Those blogs have been taken down; I’ve copied the original content here.


Today on Ask the Bug Guys, reader Raghavendra has a question about this C# bug:

if (someValue != null)
{
  ...

if someValue is of a non-nullable value type that defines an equality operator — like, int or Guid — then this code is legal, but almost certainly wrong, since the condition will always be true. Raghavendra’s question is why should the compiler give a warning here rather than an error?

Good question. Before getting into that though I want to clear up a few related points.

First off, let me say that some releases of the compiler fail to give a warning for some cases where a non-nullable value type is compared to null. This was an accident; it was a bug introduced in I believe C# 3. Apologies for the mistake.

Second, it might not be clear why this code is legal at all. Operators are analyzed using a process almost identical to method overload resolution; shouldn’t operator overload resolution be failing here? No; if there exists an equality operator T==T for non-nullable value type T then you get a “lifted” operator T?==T? for free, which has the semantics “true if both null, false if exactly one null, otherwise defer to the original operator”. Overload resolution is choosing the lifted operator here.

So now to come to the actual question. The code is clearly wrong. Shouldn’t clearly wrong code be an error, not a warning? Let’s look at some analogous situations.

  • Pro error: C# already makes many “impossible” uses of equality comparisons illegal. Comparison of a value type to a reference type, or comparison of unrelated reference types is illegal.
  • Pro warning: Similar expressions that always produce a given result are warnings, not errors, like null is string.
  • Pro neither: C# does not make illegal many other impossible equality comparisons. Comparing an integer to the double 12.3 can’t possibly be true either but that is neither a warning nor an error. (Though maybe it should be.) In general, determining whether a comparison is always true or always false is equivalent to the halting problem, so we can’t always produce a warning or error.

We don’t seem to have learned much there. Some other impossible comparisons are errors, some are warnings. What are some other points for and against making such a comparison an error?

  • Pro error: This was an error in C# 1.0, before nullable value types were added to the language. Existing code would not have been broken by keeping it an error in C# 2.0.
  • Pro warning: making expressions that always produce the same value into errors rather than warnings increases the difficulty of making computer-generated code correct. Sometimes it is convenient for an automatic code generator to generate redundant code.
  • Pro warning: consistency with generics.

That last one needs some explanation. Suppose we have:

void M<T>(T t)
{
  if(t == null) ...

Clearly what is best is if M<int> has the same semantics as

void M_int(int t)
{
  if(t == null) ...

So then the question is: suppose the latter is an error. Should the former be an error as well, provided that T is not constrained to be a reference type?

The language designers decided that this should be legal; if under construction T is a type that cannot be compared to null then the comparison to null is false, not an error. For consistency then it seems reasonable that the non-generic version of the same code should have the same behaviour.

Summing up: there are reasonable arguments on both sides, but on balance I think the decision to make this a warning rather than an error was the right one.

19 thoughts on “ATBG: Warnings vs errors

  1. I like stuff like that to be warnings.

    On the one hand, I certainly want the IDE to tell me about it, because code like that shouldn’t be in production.

    On the other, not all code is production code. I may for smoke testing on my own machine want to force execution of a particular code path, to verify its correctness. One easy way to do that is to temporarily hack a few if-statements here and there to be always true or always false. I want the compiler to let me do that – errors just make me work around it with more impactful code changes, which may increase the change I don’t put it back together correctly afterwards, or introduce changes that affect the correctness of the test.

    We don’t have a rule in our shop that warnings are treated as errors, but we have a standard that checked-in code should compile without warnings. This strikes me as the correct balance. Allowing warned code into production is bad practice and frequently a bug waiting to happen. But warnings as errors is unduly restrictive when hacking code around on a temporary basis.

    Professionals IME regard following good practices with just as much weight as they regard writing code that compiles.

  2. I didn’t know this was legal code, I feel more pissed off now when compiler complain about int? variable = condition ? 9 : null;

    Looks inconsistent

    • As I understand it, ?: in general makes no attempt to infer a type to which both operands could be converted. Mammal m = cond ? new Sheep() : new Horse() won’t compile either.

      It would be kind of handy, though. I’ve run into that one a number of times, and the code that resulted wasn’t as clean as I would like.

      • Could the C# language be changed to include an implicit cast when the two alternatives are of different types?

        As with most of these annoyances, there’s probably a very good reason why not that’s only apparent when you’ve thought through all of the implications of a change. That’s why people like me should not be allowed near a language specification.

        • The listed code snippet was a common complaint in Java <5 (4? well something) too. In java it was solved by basically applying overloading resolution to it (basically whatever type T it'd resolve to for foo(T a, T b) ).

          Considering the similarity of C#'s and Java's grammar in general, I'd think that's a strong argument in the general feasibility of such an implementation

  3. I’m not sure why “foo==null” should be expected to behave the same when foo is a particular value type as when it’s generic, given that “foo==anythingElse” will compile for many specific value types, but not for any generic such types could satisfy.

    Despite sharing the same “==” token as the overloadable equality operator and the default non-overloaded reference-equality operator, I would regard the token as binding to a different operator in the expressions “nullableValueType ==null” and “genericType== null” [and the commuted forms] to be different operators, and as such would expect them to be evaluated and bound differently. I see no semantic reason those operators should have to accept non-nullable types; if operator hoisting forces the issue, one could avoid that by having the compiler define “nonNullableType==null” as an operator which returns false but is tagged “Obsolete()” [btw, if I had my druthers, a few other equality tests like “(double,float)” and “(float,double)” would also have explicit “obsolete” implementations].

    • And on the other side of the spectrum there are people who argue that if “x” and “y” are values of types that cannot be implicitly casted to a single type, then “x == y” must not be a compiler error, but a valid code. And the value of such an expression should be “false”, obviously—since those two values cannot possibly be equal (they’re incompatible), they’re unequal.

      • I would suggest that the legality of `x==y` should depend upon why the variables cannot identify the same non-null object. If `x` is a sealed class and `y` is an interface which it does not implement it, it would not be possible for an instance of a type derived from the *present* version of `x` to implement `y`, but the question of “whether a class should be sealed” is actually two questions, with slightly-orthogonal answers:

        -1- Is anyone allowed to derive from the class.

        -2- Are consumers entitled to assume that any non-null reference of this class type will identify an instance of this *exact* class.

        In the absence of any benefit to answering “yes” to either question, the answer to *both* questions should be no. If the answer to question #2 is no, a compiler would be out of line in assuming that a type which doesn’t presently allow the construction of any derived-type instances will never do so, and in disallowing code from armoring itself against such a possibility. Unfortunately, since the only way for a class to answer “no” to the second question would be to answer “yes” to the first, and many classes should answer “no” to both, the fact that a class answers “no” to the first does not imply that the second should be a “yes”.

        [Incidentally, I think the proper solution to this would have been to have separate access controls for who can chain to a constructor versus who can invoke it via `new` expression. If that existed, a type which didn’t want to allow free creation of derivatives but didn’t want to promise that it wouldn’t be inherited could restrict constructor chaining to the same assembly even the non-chained constructor was public.]

  4. I still remember when I was learning Java and discovered that “unreachable code” was a compiler error. This infuriated me and likely permanently skewed my view of the language (unfairly, I’m sure).

    If a compiler can successfully compile some input, then it seems like it should be impossible to raise an error. It can throw all the warnings it likes, but errors should be left for irreconcilable conditions. I’m sure there’s exceptions to this approach and that part of what defines an “error” is what the language says is an error.

    The C# compiler seems to do a very good job walking the line between being a nagging old hag and a tool for professionals.

    • The goal of a compiler should be to either produce code that does what the programmer intended, or nudge the programmer into stating the intended behavior more clearly. In cases where code’s behavior is probably as intended, but there’s still a significant likelihood that it isn’t, and if the code could be written just as clearly while eliminating the ambiguity, then a warning would likely be appropriate. In cases where the behavior of the code–if desired–would almost certainly be something other than what was written, an error would be appropriate. For example, passing any value type to the `Equals(Object)` virtual method of any type which defines any other overloads (e.g. calling `someFloat.Equals(someDouble)`) is almost always a mistake. While there might be some cases where one would actually want the resulting behavior, in such cases adding a cast to `Object` would make the code clearer.

      If a programmer writes `someFloat.Equals(someDouble)`, would a compiler have any “difficulty” writing code to box `someDouble` and pass the resulting object to the virtual `Equals(Object)` of `someFloat`? Does that mean that the compiler’s acceptance of that construct should be considered a good thing, or does it mean that there should be a means by which `Object` parameters can indicate that values which would require boxing should be refused?

        • Java is willing to silently eliminate unreachable code in the scenario where an “if” condition controls reachability, even if the condition evaluates as a constant true or false. A `while(constantFalse)` is unacceptable, however (if the constant may vary from build to build, one should use `if (constantFalse) while(true)` to indicate that the loop condition won’t change during the loop). [lightbulb] I wonder if `if (true) return;` would be an acceptable way to unconditionally early-exit a method during development without having to put the code to be excluded in an `if(false)` block.)

          • No, it yells at you for unreachable code after the return. I use if (1+1 == 2) return;

    • Unreachable code is usually a sign of a bug in your code – I mean why exactly are you writing code that will never be used? If you use if(false) to comment out some code.. you could just comment it out and get the same result! I can’t think of a single good argument why unreachable code shouldn’t be an error (apart from the usual halting problem that limits us obviously).

      Heck we just had an extremely serious security bug (the Apple SSL thing) a few weeks ago, because C compilers were not capable of producing a warning or better yet an error for unreachable code!

      • In my experience, unreachable code is either a sign that I’m trying to debug something or that a code generator was used. Code generators are usually simple template engines that have no means to elide code that is unreachable, let alone determine that in the first place!

        Of course it could also be an indication of a bug, which is why the compiler emits a warning about it.

        • I don’t agree that writing if (false) { at the start and closing the bracket at the end is noticeably less work than selecting the whole code and pressing ctrl-7 (well or the horrible VS default shortcuts), so I find that a weak argument.

          code generators are a fair point, but it’s not such a big problem really to take care of that and people who write codegenerators probably had to solve much hairer problems beforehand (I certainly can’t remember taking care of this special case being the biggest problem when I wrote some).

      • There are times when it’s useful for code to include “configuration options”. Such abilities are more useful in languages which either have #include-style features or allow class names object file names to be independent of source file names (so one piece of source can produce different object files or different classes with slightly-different behaviors) but can be somewhat useful even in Java. Something analogous to #ifdef might be better (since it would be able to selectively disable field declarations as well as executable code) but even with its restrictions, being able to say `static final someOption = false;` and later say `if (someOption) doSomethng()`; can be useful, especially if multiple `if` pieces of code need to either be enabled or all be disabled.

        • Mhm can’t say I’m a fan of mixing code and data so I’d definitely consider making that either a compile/runtime option (and providing some default) or putting it into extra config files.

          I mean if you hardcode those values in the code, how do you test the different codepaths?

          • In languages with #include or similar functionality, I’d exercise such build options either by having a C file containing two lines “#define OTHER_MODE” and “#include “otherfile.C”, or by having multiple link scripts which pre-define different macros at the command line and direct object files to different directories. The result will be that I end up with separate object files that have somewhat different functionality, which can be built into different executable files; the two files would then be tested separately. Perhaps not the nicest approach, but if e.g. code is supposed to run on a $0.73 microcontroller which is big enough to handle various combinations of desired features, but too small to handle them all, compile-time selection is probably better than trying to combine link-time selection with run-time dispatch.

            I don’t think such a pattern would work as well with Java’s compilation model, but it might still be useful in cases where one wants to build e.g. “demo” and “real” versions of an application, and minimize the amount of “real-version-only” code in the released demo version.

Leave a comment