What is “duck typing”?

Seriously, what is it? It’s not a rhetorical question. I realized this morning that I am totally confused about this.

First off, let me say what I thought “duck typing” was. I thought it was a form of typing.

So what is “typing”? We’ve discussed this before on this blog. (And you might want to check out this post on late binding and this post on strong typing.) To sum up:

Continue reading

About these ads

Why does a foreach loop silently insert an “explicit” conversion?

The C# specification defines

foreach (V v in x) 
  embedded-statement

as having the same semantics as:[1. This is not exactly what the spec says; I’ve made one small edit because I don’t want to get into the difference between the element type and the loop variable type in this episode.]

{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}

There are a lot of subtleties here that we’ve discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that’s an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code.[2. Or the long being a constant that fits into an int.] What justifies this odd design choice?

Continue reading

Why not allow double/decimal implicit conversions?

I’ve talked a lot about floating point math over the years in this blog, but a quick refresher is in order for this episode.

A double represents a number of the form +/- (1 + F / 252 ) x 2E-1023, where F is a 52 bit unsigned integer and E is an 11 bit unsigned integer; that makes 63 bits and the remaining bit is the sign, zero for positive, one for negative. You’ll note that there is no way to represent zero in this format, so by convention if F and E are both zero, the value is zero. (And similarly there are other reserved bit patterns for infinities, NaN and denormalized floats which we will not get into today.)

A decimal represents a number in the form +/- V / 10X where V is a 96 bit unsigned integer and X is an integer between 0 and 28.

Both are of course “floating point” because the number of bits of precision in each case is fixed, but the position of the decimal point can effectively vary as the exponent changes.
Continue reading

Static analysis of “is”

Returning now to the subject we started discussing last time on FAIC: sometimes the compiler can know via static analysis[1. That is, analysis done knowing only the compile-time types of expressions, rather than knowing their possibly more specific run-time types] that an is operator expression is guaranteed to produce a particular result.
Continue reading

When is a cast not a cast?

I’m asked a lot of questions about conversion logic in C#, which is not that surprising. Conversions are common, and the rules are pretty complicated. Here’s some code I was asked about recently; I’ve stripped it down to its essence for clarity:

class C<T> {} 
class D 
{
  public static C<U> M<U>(C<bool> c)   
  { return =something=; } 
} 
public static class X 
{ 
  public static V Cast<V>(object obj) 
  { return (V)obj; } 
}

where there are three possible texts for “=something=“:

  1. (C<U>)c
  2. X.Cast<C<U>>(c);
  3. (C<U>)(object)c

Version 1 fails at compile time. Versions 2 and 3 succeed at compile time, and then fail at runtime if U is not bool.

Question: Why does the first version fail at compile time?

Because the compiler knows that the only way this conversion could possibly succeed is if U is bool, but U can be anything! The compiler assumes that most of the time U is not going to be constructed with bool, and therefore this code is almost certainly an error, and the compiler is bringing that fact to your attention.

Question: Then why does the second version succeed at compile time?

Because the compiler has no idea that a method named X.Cast<V> is going to perform a cast to V! All the compiler sees is a call to a method that takes an object, and you’ve given it an object, so the compiler’s work is done. The method is a “black box” from the caller’s perspective; the compiler does not look inside that box to see whether the mechanisms in that box are likely to fail given the input. This “cast” is not really a cast from the compiler’s perspective, it’s a method call.

Question: So what about the third version? Why does it not fail like the first version?

This one is actually the same thing as the second version; all we’ve done is inlined the body of the call to X.Cast<V>, including the intermediate conversion to object! That conversion is relevant.

Question: In both the second and third cases, the conversion succeeds at compile time because there is a conversion to object in the middle?

That’s right. The rule is: if there is a conversion from a type S to object, then there is an explicit conversion from object to S.[1. Of course it is not the case that there is a conversion from every type to object. There is no conversion from any pointer type to object, from the void return type to object, and there are also some special “typed reference” helper types that cannot be converted to object. Maybe I’ll discuss those in another episode of FAIC.]

By making a conversion to object before doing the “offensive” conversion, you are basically telling the compiler “please throw away the compile-time information you have about the type of the thing I am converting”. In the third version we do so explicitly; in the second version we do so sneakily, by making an implicit conversion to object when the argument is converted to the parameter type.

Question: So this explains why compile-time type checking doesn’t seem to work quite right on LINQ expressions?

Yes! You would think that the compiler would disallow nonsense like:

from bool b in new int[] { 123, 345 } select b.ToString()

because obviously there is no conversion from int to bool, so how can range variable b take on the values in the array? Nevertheless, this succeeds because the compiler translates this to

(new int[] { 123, 345 }).Cast<bool>().Select(b=>b.ToString())

and the compiler has no idea that passing a sequence of integers to the extension method Cast<bool> is going to fail at runtime. That method is a black box. You and I know that it is going to perform a cast, and that the cast is going to fail, but the compiler does not know that.

And maybe we do not actually know it either; perhaps we are using some library other than the default LINQ-to-objects query provider that does know how to make conversions between types that the C# language would not normally allow. This is actually an extensibility feature masquerading as a compiler deficiency: it’s not a bug, it’s a feature! [2. My glib statement here conveniently ignores that this method had quite a nasty bug in its initial release, a bug that was mostly my fault. Late in the game before the release one of the developers changed the implementation of the extension method so that it allowed more conversions than were specified, as a convenience to users. I reviewed the change while under the incorrect impression that the implemented behaviour was the specified behaviour. It was not, and the implementation was quite slow in a common code path. We took a breaking change in a service pack as a result. The cost of breaking a few people who might have been relying on the unintended behaviour was considered to be low compared to the cost to everyone of the slow implementation. This was a tough, controversial call but I think we did the right thing in the end. I regret the error.]


Next time on FAIC: Should C# warn on null dereferences known to the compiler?

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T.[1. Or Nullable<T>.] Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading