Casts and type parameters do not mix

Here’s a question I’m asked occasionally:

void M<T>(T t) where T : Animal
{
  // This gives a compile-time error:
  if (t is Dog)
    ((Dog)t).Bark(); 
  // But this does not:
  if (t is Dog)
    (t as Dog).Bark();
}

What’s going on here? Why is the cast operator rejected? The reason illustrates yet again that the cast operator is super weird.
Continue reading

What is “duck typing”?

Seriously, what is it? It’s not a rhetorical question. I realized this morning that I am totally confused about this.

First off, let me say what I thought “duck typing” was. I thought it was a form of typing.

So what is “typing”? We’ve discussed this before on this blog. (And you might want to check out this post on late binding and this post on strong typing.) To sum up:

Continue reading

Why does a foreach loop silently insert an “explicit” conversion?

The C# specification defines

foreach (V v in x) 
  embedded-statement

as having the same semantics as:

{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}

(Actually this is not exactly what the spec says; I’ve made one small edit because I don’t want to get into the difference between the element type and the loop variable type in this episode.)

There are a lot of subtleties here that we’ve discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that’s an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code. (Or the long being a constant that fits into an int.) What justifies this odd design choice?

Continue reading

Why not allow double/decimal implicit conversions?

I’ve talked a lot about floating point math over the years in this blog, but a quick refresher is in order for this episode.

A double represents a number of the form +/- (1 + F / 252 ) x 2E-1023, where F is a 52 bit unsigned integer and E is an 11 bit unsigned integer; that makes 63 bits and the remaining bit is the sign, zero for positive, one for negative. You’ll note that there is no way to represent zero in this format, so by convention if F and E are both zero, the value is zero. (And similarly there are other reserved bit patterns for infinities, NaN and denormalized floats which we will not get into today.)

A decimal represents a number in the form +/- V / 10X where V is a 96 bit unsigned integer and X is an integer between 0 and 28.

Both are of course “floating point” because the number of bits of precision in each case is fixed, but the position of the decimal point can effectively vary as the exponent changes.
Continue reading

Static analysis of “is”

Returning now to the subject we started discussing last time on FAIC: sometimes the compiler can know via static analysis[1. That is, analysis done knowing only the compile-time types of expressions, rather than knowing their possibly more specific run-time types] that an is operator expression is guaranteed to produce a particular result.
Continue reading

When is a cast not a cast?

I’m asked a lot of questions about conversion logic in C#, which is not that surprising. Conversions are common, and the rules are pretty complicated. Here’s some code I was asked about recently; I’ve stripped it down to its essence for clarity:

class C<T> {} 
class D 
{
  public static C<U> M<U>(C<bool> c)   
  { return =something=; } 
} 
public static class X 
{ 
  public static V Cast<V>(object obj) 
  { return (V)obj; } 
}

where there are three possible texts for “=something=“:

  1. (C<U>)c
  2. X.Cast<C<U>>(c);
  3. (C<U>)(object)c

Version 1 fails at compile time. Versions 2 and 3 succeed at compile time, and then fail at runtime if U is not bool.

Question: Why does the first version fail at compile time?

Because the compiler knows that the only way this conversion could possibly succeed is if U is bool, but U can be anything! The compiler assumes that most of the time U is not going to be constructed with bool, and therefore this code is almost certainly an error, and the compiler is bringing that fact to your attention.

Question: Then why does the second version succeed at compile time?

Because the compiler has no idea that a method named X.Cast<V> is going to perform a cast to V! All the compiler sees is a call to a method that takes an object, and you’ve given it an object, so the compiler’s work is done. The method is a “black box” from the caller’s perspective; the compiler does not look inside that box to see whether the mechanisms in that box are likely to fail given the input. This “cast” is not really a cast from the compiler’s perspective, it’s a method call.

Question: So what about the third version? Why does it not fail like the first version?

This one is actually the same thing as the second version; all we’ve done is inlined the body of the call to X.Cast<V>, including the intermediate conversion to object! That conversion is relevant.

Question: In both the second and third cases, the conversion succeeds at compile time because there is a conversion to object in the middle?

That’s right. The rule is: if there is a conversion from a type S to object, then there is an explicit conversion from object to S.[1. Of course it is not the case that there is a conversion from every type to object. There is no conversion from any pointer type to object, from the void return type to object, and there are also some special “typed reference” helper types that cannot be converted to object. Maybe I’ll discuss those in another episode of FAIC.]

By making a conversion to object before doing the “offensive” conversion, you are basically telling the compiler “please throw away the compile-time information you have about the type of the thing I am converting”. In the third version we do so explicitly; in the second version we do so sneakily, by making an implicit conversion to object when the argument is converted to the parameter type.

Question: So this explains why compile-time type checking doesn’t seem to work quite right on LINQ expressions?

Yes! You would think that the compiler would disallow nonsense like:

from bool b in new int[] { 123, 345 } select b.ToString()

because obviously there is no conversion from int to bool, so how can range variable b take on the values in the array? Nevertheless, this succeeds because the compiler translates this to

(new int[] { 123, 345 }).Cast<bool>().Select(b=>b.ToString())

and the compiler has no idea that passing a sequence of integers to the extension method Cast<bool> is going to fail at runtime. That method is a black box. You and I know that it is going to perform a cast, and that the cast is going to fail, but the compiler does not know that.

And maybe we do not actually know it either; perhaps we are using some library other than the default LINQ-to-objects query provider that does know how to make conversions between types that the C# language would not normally allow. This is actually an extensibility feature masquerading as a compiler deficiency: it’s not a bug, it’s a feature! [2. My glib statement here conveniently ignores that this method had quite a nasty bug in its initial release, a bug that was mostly my fault. Late in the game before the release one of the developers changed the implementation of the extension method so that it allowed more conversions than were specified, as a convenience to users. I reviewed the change while under the incorrect impression that the implemented behaviour was the specified behaviour. It was not, and the implementation was quite slow in a common code path. We took a breaking change in a service pack as a result. The cost of breaking a few people who might have been relying on the unintended behaviour was considered to be low compared to the cost to everyone of the slow implementation. This was a tough, controversial call but I think we did the right thing in the end. I regret the error.]


Next time on FAIC: Should C# warn on null dereferences known to the compiler?

What’s the difference between “as” and cast operators?

Most people will tell you that the difference between (Alpha) bravo and bravo as Alpha is that the former throws an exception if the conversion fails, whereas the latter produces null. Though this is correct, and this is the most obvious difference, it’s not the only difference. There are pitfalls to watch out for here.

First off, since the result of the as operator can be null, the resulting type has to be one that includes a null value: either a reference type or a nullable value type. You cannot do x as int, that doesn’t make any sense. If the operand isn’t an int, then what should the expression’s value be? The type of the as expression is always the type named in the expression, so it needs to be a type that can take a null.

Second, the cast operator, as I’ve discussed before, is a strange beast. It means two contradictory things: “check to see if this object really is of this type, throw if it is not” and “this object is not of the given type; find me an equivalent value that belongs to the given type”. The latter meaning of the cast operator is not shared by the as operator. If you say

short s = (short)123;
int? i = s as int?;

then you’re out of luck. The as operator will not make the representation-changing conversions from short to nullable int like the cast operator would. Similarly, if you have class Alpha and unrelated class Bravo, with a user-defined conversion from Bravo to Alpha, then (Alpha) bravo will run the user-defined conversion, but bravo as Alpha will not. The as operator only considers reference, boxing and unboxing conversions.

And finally, of course the use cases of the two operators are superficially similar, but semantically quite different. A cast communicates to the reader “I am certain that this conversion is legal and I am willing to take a runtime exception if I’m wrong”. The as operator communicates “I don’t know if this conversion is legal or not; we’re going to give it a try and see how it goes”.


Notes from 2020

There were a number of pragmatic comments to the original post:

  • There is pressure to use as over casting for legibility reasons; (x as C).M() is perceived as easier to read than ((C)x).M(). Indeed, the cast operator has other syntactic problems than just being ugly; is (C)-1 casting -1 to C, or subtracting 1 from (C)?
  • It seems like there is a missed opportunity for a syntactic sugar that both tests the type and gives you a value of that type in one step; those comments and similar comments from many other users eventually led to if (x as C c) being added to the language.
  • One commenter pointed out that use of type checking operators in generic code is a bad smell; I thoroughly agree.
  • Another pointed out that any sort of type checking at runtime is an indication that the programmer was unable to produce a proof of type safety that could be checked by the compiler, and is therefore suspect. It would be nice if those points were easier to find quickly in a program because they need deeper code review. This is an often neglected aspect of language design; I should write more about that!
  • There was also a long conversation about the use of various operators for different kinds of dynamic dispatch. I discussed these in my Wizards and Warriors series in more detail.

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T (or Nullable<T>.) Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading