Static analysis of “is”

Returning now to the subject we started discussing last time on FAIC: sometimes the compiler can know via static analysis[1. That is, analysis done knowing only the compile-time types of expressions, rather than knowing their possibly more specific run-time types] that an is operator expression is guaranteed to produce a particular result.

But before we get into that, let’s briefly review the meaning of is in C#. The expression

x is T

for an expression x and a type T produces a bool. Generally speaking, if there is a reference, boxing or unboxing conversion from the runtime value of x to the type T then the result is true, otherwise the result is false. Note that in particular user-defined conversions are not considered. The intention of the operator is to determine if the runtime value of x is actually a useful value of the type T,[2. It is not to determine “is there any way to associate a value of type T with this value?” or “can the value x be legally assigned to a variable of type T?”] and therefore we add a few additional caveats:

  • T may not be a pointer type
  • x may not be a lambda or anonymous method
  • if x is classified as a method group or is the null literal[3. There are some small spec holes here and around these holes are minor inconsistencies between the C# 5 compiler and Roslyn. The spec does not say what to do if the expression x is a namespace or a type; those are of course valid expressions. The compiler produces an error stating that an expression with a value is expected. Similarly for write-only properties and write-only indexers. The original-recipe C# 5 compiler produces an error if the expression is a void-returning call, which technically is a spec violation; Roslyn produces a warning, though frankly, I’d be more inclined to change the spec in this case. The specification does say that x is T is an error if T is a static type; the C# 5 compiler erroneously allows this and produces false whereas the Roslyn compiler produces an error.] then the result is false
  • if the runtime type of x is a reference type and its value is a null reference then the result is false
  • if the runtime type of x is a nullable value type and the HasValue property of its value is false then the result is false
  • if the runtime type of x is a nullable value type and the HasValue property of its value is true then the result is computed as though you typed x.Value is T

So, knowing that, try to think of some situations in which you know for certain that x is T is going to always produce true, or always produce false. Here are a few that you and I know are always true:

int i = 123; 
bool b1 = i is int; 
bool b2 = i is IComparable; 
bool b3 = i is object; 
bool b4 = "hello" is string;

Here in every case we know first, that the operand is not null, and second, that the operand will always be of the given type, and therefore the operator will always produce “true”.

Before we go on, I want to briefly review our criteria for when to make a warning: the warning in question has got to be (1) thought of, (2) possible to implement cheaply, (3) identify code that is both plausibly written by a real user and almost certainly wrong, but non-obviously wrong (4) be work-around-able should the code actually be correct and (5) not turn huge amounts of existing code into spurious errors.

Of these four lines, only the third strikes me as plausibly written by a real user and almost certainly wrong; the user must not realize that every int always implements IComparable. The other ones are just weird. Interestingly enough, the C# 5 compiler warns about the first three, but not the fourth.

There are a lot more situations where you know that the result will always be false. Can you think of some? Here are just a few off the top of my head:

bool b5 = M is Func<object>; // M is a method group 
bool b6 = null is object; 
bool b7 = b5 is IEnumerable; 
bool b8 = E.Blah is uint; // E is an enum type bool 
b9 = i is double;

The first two follow the rules of the spec. The latter three are cases where we can know via static analysis that the value cannot possibly be converted to the given type by a reference, boxing or unboxing conversion. The compiler produces a warning for all these trivially-analyzed cases. (Though of course again some of these examples — b6 in particular — are unlikely to show up in real code.)

That was a whole lot of preamble to the question I actually want to consider today, which is: how far should we go when making these sorts of static analyses for the purpose of warning that an is expression is always false? We could go a lot farther! I started this series off with a puzzle about the cases where there is no compile-time conversion from x to T, but nevertheless x is T can be true; today I want to talk about what we do when there is no compile-time conversion from x to T, and hey, x is T really cannot possibly be true as a result. There are lots of cases where you and I know that a given expression will never be of a given type, but these cases can get quite complex. Let’s look at three complex ones:

class C<T> {} 
... 
static bool M10<X>(X x) where X : struct 
{ return x is string; } 
static bool M11<X>(C<X> cx) where X : struct 
{ return cx is C<string>; } 
static bool M12<X>(Dictionary<int, int> d) 
{ return d is List<X>; }

In case M10 we know that X will always be a value type and no value type is convertible to string via a reference, boxing or unboxing conversion. The type check must be false.

In case M11 we know that cx is of type C<some-value-type>, or a type derived from C<some-value-type>. You and I know that there is no way to get the same generic class type into a type hierarchy twice; there is no way to make something derive from both C<some-value-type> and C<string>. So the type check must be false.

In case M12 we know that there is no way to make an object that inherits from both the dictionary and list base classes, no matter what X is. The type check must be false.

In all these cases the compiler could produce a warning, but we are rapidly running up against the “possible to implement cheaply” criterion! The compiler team could spend a lot of expensive time coming up with heuristics which would not make a difference to real users writing real code. They have to draw the line somewhere.

Where is that line? The heuristic the compiler team actually uses to determine whether or not to report a warning when the compiler detects that there is no compile-time conversion from x to T is as follows:

  • If neither the compile time type of x nor the type T is an open type (that is, a type that involves generic type parameters) then we know that the result will always be false. There’s no conversion, and nothing to substitute at runtime to make there be a conversion. Give a warning.
  • One of the types is open. If the compile time type of x is a value type and T is a class type then we know that the result will always be false.[4. This is a lie; there is one case where the compile time type of x is an open value type and T is a class type, and there is no compile-time conversion from x to T, and x is T can be true, and therefore the warning must be suppressed. Can you write a program that demonstrates this scenario? I’ll put the answer in the comments.] (This is case M10 above.) Give a warning.
  • Otherwise, abandon further analysis and do not produce a warning.

This is far from perfect, but it is definitely in the “good enough” bucket. And “good enough” is, by definition, good enough. Of course, this heuristic is subject to change without notice, should the compiler team discover that some real user-impacting scenario motivates changing it.


Next time on FAIC: How does the compiler ensure that the method type inference algorithm terminates?

Advertisements

3 thoughts on “Static analysis of “is”

  1. Wow that was unusual. I just wrote an extremely long comment but after I clicked submit my comment didn’t appear.
    Grrrr… well I’m not writing all that over again. Anyways,
    just wanted to say excellent blog!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s