About ericlippert

http://ericlippert.com

Null is not false, part two

Posted on April 12, 2012 by ericlippert

In Raymond Smullyan‘s delightful books about the Island of Knights and Knaves — where, you’ll recall, knights make only true statements and knaves make only false statements — the knights and knaves are of course clever literary devices to explore problems in deductive logic.

(Aside: Smullyan’s book of combinatory logic puzzles, To Mock a Mockingbird is equally delightful and I recommend it for anyone who wants a playful introduction to the subject.)

Smullyan, to my recollection, never explores what happens when knights and knaves make statements which are disingenuous half-truths, authorial license in pursuit of a larger truth, or other forms of truthiness. A nullable Boolean in C# gives us, if not quite the notion of truthiness, at least the notion that true and false are not the only possible values of a predicate: there is also “null”, whatever that means.

What does that mean? A null Boolean can mean “there is a truth state, but I just don’t know what it is”: for example, if you queried a database on December 1st to ask “were the sales figures for November higher than they were in October?” the answer is either true or false, but the database might not know the answer because not all the figures are in yet. The right answer in that case would be to say “null”, meaning “there is an answer but I do not know what it is.”

Or, a null Boolean can mean “the question has no answer at all, not even true or false”. True or false: the present king of France is bald. The number of currently existing kings of France — zero — is equal to the number of currently existing bald kings of France, but it seems off-putting to say that a statement is “vacuously true” in this manner when we could more sensibly deny the validity of the question. There are certainly analogous situations in computer programming where we want to express the notion that the query is so malformed as to not have a truth value at all, and “null” seems like a sensible value in those cases.

Because null can mean “I don’t know”, almost every “lifted to nullable” operator in C# results in null if any operand is null. The sum of 123 and null is null because of course the answer to the question “what is the sum of 123 and something I don’t know” is “I don’t know!” The notable exceptions to this rule are equality, which says that two null values are equal, and the logical “and” and “or” operators, which have some very interesting behaviour. When you say x & y for nullable Booleans, the rule is not “if either is null then the result is null“. Rather, the rule is “if either is false then the result is false, otherwise, if either is null then the result is null, otherwise, the result is true“.

Similarly for x | y — the rule is “if either is true then the result is true, otherwise if either is null then the result is null, otherwise the result is false“. These rules obey our intuition about what “and” and “or” mean logically provided that “null” means “I don’t know”. That is the truth value of “(something true) or (something I don’t know)” is clearly true regardless of whether the thing you don’t know is true or false. But if “null” means “the question has no answer at all” then the truth value of “(something true) or (something that makes no sense)” probably should be “something that makes no sense”.

Things get weirder though when you start to consider the “short circuiting” operators, && and ||. As you probably know, the && and || operators on Booleans are just like the & and | operators, except that the && operator does not even bother to evaluate the right hand side if the left hand side is false, and the || operator does not evaluate the right hand side if the left hand side is true. After we’ve evaluated the left hand side of either operator, we might have enough information to know the final answer. We can therefore (1) save the expense of computing the other side, and (2) allow the evaluation of the right hand side to depend on a precondition established by the truth or falsity of the left hand side. The most common example of (2) is of course if (s == null || s.Length == 0) because the right hand side would have crashed and burned if evaluated when the left hand side is true.

The && and || operators are not “lifted to nullable” because doing so is problematic. The whole point of the short-circuiting operator is to avoid evaluating the right hand side, but we cannot do so and still match the behaviour of the unlifted version! Suppose we have x && y for two nullable Boolean expressions. Let’s break down all the cases:

x is false: We do not evaluate y, and the result is false.
x is true: We do evaluate y, and the result is the value of y
x is null: Now what do we do? We have two choices:
- We evaluate y, violating the nice property that y is only evaluated if x is true. The result is false if y is false, null otherwise.
- We do not evaluate y. The result must be either false or null.
  - If the result is false even though y would have evaluated to null, then we have resulted in false incorrectly.
  - If the result is null even though y would have evaluated to false, then we have resulted in null incorrectly.

In short, either we sometimes evaluate y when we shouldn’t, or we sometimes return a value that does not match the value that x & y would have produced. The way out of this dilemma is to cut the feature entirely.

I said last time that I’d talk about the role of operator true and operator false in C#, but I think I shall leave that to the next episode in this series. Next time on FAIC we’ll digress briefly and then conclude this series after that.

Null is not false, part one

Posted on March 26, 2012 by ericlippert

The way you typically represent a “missing” or “invalid” value in C# is to use the “null” value of the type. Every reference type has a “null” value; that is, the reference that does not actually refer to anything. And every “normal” value type has a corresponding “nullable” value type which has a null value.

The way these concepts are implemented is completely different. A reference is typically implemented behind the scenes as a 32 or 64 bit number. As we’ve discussed previously, that number should logically be treated as an “opaque” handle that only the garbage collector knows about, but in practice that number is the offset into the virtual memory space of the process that the referred-to object lives at, inside the managed heap. The number zero is reserved as the representation of null because the operating system reserves the first few pages of virtual memory as invalid, always. There is no chance that by some accident, the zero address is going to be a valid address in the heap.

Continue reading →

Why not automatically infer constraints?

Posted on March 9, 2012 by ericlippert

Suppose you have a generic base type with a constraint:

class Bravo<T> where T : IComparable<T> { ... }

If you make a generic derived class in the obvious way:

class Delta<U> : Bravo<U> { ... }

then the C# compiler gives you an error: Continue reading →

Why are local variables definitely assigned in unreachable statements?

Posted on March 5, 2012 by ericlippert

You’re probably all familiar with the feature of C# which disallows reading from a local variable before it has been “definitely assigned”:

void M()
{
  int x; 
  if (Q()) 
    x = 123; 
  if (R()) 
    Console.WriteLine(x); // illegal! 
}

This is illegal because there is a path through the code which, if taken, results in the local variable being read from before it has been assigned; in this case, if Q() returns false and R() returns true.

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values.[1. The C and C++ programming languages do not necessarily, and will cheerfully allow you to read garbage from an uninitialized local.] Rather, it is because the existence of such a code path is probably a bug, and we want to throw you into the Pit of Success; you should have to work hard to write that bug.

The way in which the compiler determines if there is any path through the code which causes x to be read before it is written is quite interesting, but that’s a subject for another day. The question I want to consider today is: why are local variables considered to be definitely assigned inside unreachable statements?

void M() 
{ 
  int x; 
  if (Q()) 
    x = 123; 
  if (false) 
    Console.WriteLine(x); // legal! 
}

First off, obviously the way I’ve described the feature immediately gives the intuition that this ought to be legal. Clearly there is no path through the code which results in the local variable being read before it is assigned. In fact, there is no path through the code that results in the local variable being read, period!

On the other hand: that code looks wrong. We do not allow syntax errors, or overload resolution errors, or convertibility errors, or any other kind of error, in an unreachable statement, so why should we allow definite assignment errors?

It’s a subtle point, I admit. Here’s the thing. You have to ask yourself “why is there unreachable code in the method in the first place?” Either that unreachable code is deliberate, or it is an error.

If it is an error, then something is deeply messed up here. The programmer did not intend the written control flow in the first place. It seems premature to guess at what the definite assignment errors are in the unreachable code, since the control flow that would be used to determine definite assignment state is wrong. We are going to give a warning about the unreachable code; the user can then notice the warning and fix the control flow. Once it is fixed, then we can consider whether there are definite assignment problems with the fixed control flow.

Now, why on earth would someone deliberately make unreachable code? It does in fact happen; actually it happens quite frequently when dealing with libraries made by another team that are not quite done yet:

// If we can resrov the frob into a glob, do that and then blorg 
// the result. Even if the frob is not a glob, we know it is 
// definitely a resrovable blob, so resrov it as a blob and then
// blorg the result. Finally, fribble the blorgable result, 
// regardless of whether it was a glob or a blob. 
void BlorgFrob(Frob frob) 
{ 
  IBlorgable blorgable;   
  // TODO: Glob.TryResrov has not been ported to C# yet. 
  if (false /* Glob.TryResrov(out blorgable, frob) */) 
  { 
    BlorgGlob(blorgable); 
  } 
  else 
  { 
    blorgable = Blob.Resrov(frob) 
    BlorgBlob(blorgable); 
  } 
  blorgable.Fribble(frob); 
}

Should BlorgGlob(blorgable) be an error? It seems plausible that it should not be an error; after all, it’s never going to read the local. But it is still nice that we get overload resolution errors reported inside the unreachable code, just in case there is something wrong there.

The solution to the simple puzzle

Posted on February 27, 2012 by ericlippert

Last time I asked if you could find the bug in the original version of my histogram code. Here’s how I found it:
Continue reading →

A simple puzzle

Posted on February 24, 2012 by ericlippert

The original version of the histogram-generating code that I whipped up for the previous episode of FAIC contained a subtle bug. Can you spot it without going back and reading the corrected code? Continue reading →

Generating random non-uniform data in C#

Posted on February 21, 2012 by ericlippert

UPDATE: I’ve posted a related article here.

When building simulations of real-world phenomena, or when generating test data for algorithms that will be consuming information from the real world, it is often highly desirable to produce pseudo-random data that conform to some non-uniform probability distribution.

But perhaps I have already lost some readers who do not remember STATS 101 all those years ago. I sure don’t. Let’s take a step back. Continue reading →

Bad metaphors

Posted on February 13, 2012 by ericlippert

The standard way to teach beginner OO programmers about classes is to make a metaphor to the real world. And indeed, I do this all the time in this blog, usually to the animal kingdom. A “class” in real life codifies a commonality amongst a certain set of objects: mammals, for example, have many things in common; they have backbones, can grow hair, can make their own heat, and so on. A class in a programming language does the same thing: codifies a commonality amongst a certain set of objects via the mechanism of inheritance. Inheritance ensures commonalities because, as we’ve already discussed, “inheritance” by definition means “all[1. Excepting constructors and destructors.] the members of the base type are also members of the derived type”.
Continue reading →

What is late binding?

Posted on February 6, 2012 by ericlippert

“Late binding” is one of those computer-sciency terms that, like “strong typing“, or “duck typing” means different things to different people. I thought I might describe what the term means to me.

First off, what is “binding”? We can’t understand what it means to bind late if we don’t know what it is to bind at all. Continue reading →

What’s the difference between a trenchcoat and a duster?

Posted on February 3, 2012 by ericlippert

Today, yet another episode in my ongoing series “What’s the difference?” This time, a non-computer-related topic.

I am often complimented on my choice of outerwear in the Seattle rainy season, and I hate to respond to a well-meant compliment with a correction. So I usually let all those “Nice trenchcoat!” comments slide and just say “Thanks!” But as a public service, let me lay it out for you so that you don’t make the same mistake. Here we see David Tennant as the Tenth Doctor wearing a classic example of a trenchcoat: (Click for a larger version.)

The trenchcoat is a long waterproof coat, traditionally made of gabardine. The term originated in the trenches of the First World War, due to the popularity of this style of coat amongst officers in the British armed forces. The trench coat is not merely a functional warm raincoat but also stylish, with long wide lapels and decorative buttons. The trenchcoat is often belted and might be tailored in at the waist, particularly for women’s trenchcoats.

A duster is also a long waterproof coat that is often referred to as a “trenchcoat” — but as you’ll see, it is quite different in its details. Here’s the duster I wear, an Australian-made Driza-Bone:

Note the lack of decorative elements, the flap over the closure, the no-lapel collar (which clasps shut, completely enclosing the neck if necessary) and the built-in extra rain protection on the shoulders. (*) Dusters are typically made of oilcloth and are built for handling the practicalities of herding sheep in the rain, not for style (**).

Not shown in this view: the interior includes straps that let you attach the bottom of the coat to your legs, so that it does not blow around when you are on horseback. Also, the back is cut in such a way that you can cover both your legs and the rear portion of the saddle with the coat. I usually take the bus and not a horse to work, but still it’s nice to know that options are available should I need them. These practical elements are usually not present in trenchcoats.

Right, glad we got that sorted out.

Next time on FAIC: What is binding, and why is it always either early or late? Can’t it ever be on time?

(*) Duster manufacturers always hasten to point out that the shoulders are already waterproof; the extra layer keeps your shoulders warmer by shedding rain more effectively.

(**) There are, of course, some dusters built for style; if you watch the “Matrix” series of movies you’ll see the heroes wear an assortment of extremely stylish dusters and trenchcoats both.

Fabulous adventures in coding

Eric Lippert's blog

Author Archives: ericlippert