Casts and type parameters do not mix

Here’s a question I’m asked occasionally:

void M<T>(T t) where T : Animal
{
  // This gives a compile-time error:
  if (t is Dog)
    ((Dog)t).Bark(); 
  // But this does not:
  if (t is Dog)
    (t as Dog).Bark();
}

What’s going on here? Why is the cast operator rejected? The reason illustrates yet again that the cast operator is super weird.

Let me begin by again saying that if you are doing a run-time type check of something generic, you’re probably doing something wrong. Generics should be actually generic, and work the same on any possible type. This code has a bad smell about it. But that’s not what I want to talk about today.

To illustrate why the language disallows this cast, let’s consider a few scenarios. Suppose we have three types derived from Animal: Dog, Wolf and Cat. Moreover, we have a user-defined conversion from Wolf to Dog. The various conversions imposed by the cast operator all have different semantics:

Dog dog = new Dog();
Cat cat = new Cat();
Wolf wolf = new Wolf();
object obj = cat;
Dog d1 = (Dog) dog;  // identity conversion -- no op
Dog d2 = (Dog) wolf; // user-defined conversion -- call to static method
Dog d3 = (Dog) cat;  // compile-time error
Dog d4 = (Dog) obj;  // run-time error

These examples give us two justifications for producing an error when casting t to Dog. The first is: if T is Cat then we would expect a cast to Dog to be a compile-time failure. Since we have no reason to suppose that T is not Cat, similarly this should produce a compile-time error.

Note that this justification again illustrates that generics are not templates. In C++ you know at compile time whether the template is being instantiated with Cat or not; in C#, generics are required to type check for any possible type argument, not just for the particular type arguments provided.

The second justification is: if T is Wolf then we should expect that a cast of t to Dog is a call to a method, but if T is Dog then the conversion is a no-op. Again, generics are not templates; the code is only generated once and has to work for any instantiation. The cast cannot be generated as both a call to a static method and as a no-op.

Of course, if instead of taking a T the method took a dynamic, the code for the cast would be generated as a call to a method that starts up the compiler again at runtime and generates fresh code; if the argument were a Wolf then a call to the conversion method would be dynamically generated, and if the argument were a Cat then the program would crash with a type error. Again, the point of generic types is to avoid the danger and performance cost of doing the analysis at runtime.

Now, one could reasonably point out that in the actual code we started with, we’ve already done a run-time check on t, and so we know that t is not a Wolf or Cat at the point of the cast. This feature — computing additional type information on variables in different portions of the code — has been suggested a number of times for C#, but it is a surprisingly hard feature to get right in the general case. There are a lot of TOCTOU-like problems; you must be able to prove that the variable you type checked did not mutate to have a different run-time type between the time of the check and the time of its use. Since C# does not attempt to implement this feature, the fact that the cast is protected by a condition, and therefore must be an identity or reference conversion is not taken into consideration.

(Aside: There are of course cheaper ways to make this feature work; for example, a proposed feature for C# 7 is if(t is Dog d) d.Bark(); where the is may introduce a new variable and assign the reference. Or allow variable declarations to be expressions that produce the value of the declared variable as their value: if((Dog d = t as Dog) != null) d.Bark(); There are lots of ways to solve this problem by introducing new syntax, but this is a good subject for another day.)

So why then does the as version work? Because is and as do not pay any attention to user-defined conversions! The set of conversions supported by as is much smaller, and the compiler can generate the same code for any as conversion regardless of how T is instantiated.

Similarly, ((Dog)(object)t).Bark() would compile because this code tells the compiler to simply do a runtime type check similar to what as does; if t were Wolf or Cat then this would be a runtime error.

The moral of the story here is that once again we see that the cast operator tries to do too much. The cast operator can be both a representation-preserving and a representation-changing operation, and the compiler must know which it is at compile time in order to generate code. With generics there often is not enough information available at compile time to know that, and so many casts must be made illegal.

41 thoughts on “Casts and type parameters do not mix

  1. Even though the “is” and “as” in the above example works, I think it’s important to point out that it is a not such a good idea to do it like that because of performance reasons – you’re actually performing two casts here (one for is, one for as) where only one is need. Therefore, I’m usually going for this approach:

    Dog dog = t as Dog;
    if (dog != null) {
    dog.Bark()
    }

    If you plan to use the object after the conversion, it’s redundant to use “is” first.

    • Your approach is fine in many cases, but keep in mind that it’s not the same thing. There might be a case where you need to differentiate “it’s not a Dog”, from “it’s a Dog, but it’s null”. For example, what if ‘Bark’ were an extension method that takes a Dog, and you want to call it even if ‘t’ is null?

        • I benchmarked it too and in my test, is-test-then-cast is actually slower, I checked the link, and likely there the author benchmarked wrong, e tested in a Core 2 Duo, I in a AMD K8 and with a more recent version of the runtime, with his processor I have seen a few times a very tight loop being slower than a slight bigger one.

          Also, remember modern processors do lots of clever tricks to improve performance like executing many instructions in parallel, so in this case the processor may execute both the test and the cast in parallel, and, for some weird reason, in his case doing more work were faster, but, in actual code with actual work to be done it may be slower for taking processing resources for redundant work.

          The link for the other benchmarking in his page contains the same problem by ignoring branch prediction and cache pressure.

          • I fail to see how my tests ignore branch prediction and cache pressure, or what that even means in this context. Test-then-cast could be easily recognized by the JIT and optimized into a single test and branch. Different JITs have different optimizations, and different processors will have different performance profiles. A newer version of the runtime may not do this optimization for various reasons, but that doesn’t make my results “wrong”.

            Furthermore, I bet your claim that it’s “actually slower” amounts to less than 5%, which is probably within the accepted margin for error (test-then-cast was actually a few % faster in my tests, but within error margins).

            Even if it were 5% slower, the relative infrequency of this pattern wouldn’t even be noticeable in literally any real program. Which means you should use the test/cast pattern that’s most appropriate for a given scenario, not contort your thinking by some micro-optimization that has no measureable impact.

          • Branch prediction and cache pressure are related to this link: http://higherlogics.blogspot.com.br/2008/10/vtable-dispatching-vs-runtime-tests-and.html

            Test-then-cast could be recognized by the JIT, but currently, they aren’t, the fact a difference exists at all between the two test cases means the JIT produced two different codes, so something was not optimized away, a 5% difference in one run may be error margins, but not a consistent 5% difference.

            Beside the benchmarking methodology you are correct, casts are too fast and too uncommon to matter, so, maybe this is the reason the JIT team didn’t bother optimizing it, but, Visual Studio good practices analyzer (or any other cool name I don’t remember) identifies this as a performance bug.

          • > between the two test cases means the JIT produced two different codes, so something was not optimized away, a 5% difference in one run may be error margins, but not a consistent 5% difference.

            This is incorrect. Even identical code can have consistent performance differences if it’s aligned differently such that it falls on different cache lines, or is accessing memory at sufficiently different alignments, or any number of other reasons. Those error margins are broader than you imply.

          • Pick the right tools for the right job. In this case we are not actually interested in the speed of the two methods, we want to know what code the compiler generates and if it avoids the redundant checks, respectively calls.

            So let’s see what VS2015 with .NET 4.6 in release mode for Any CPU generates for
            public static void MakeCatSound(object o)
            {
            if (o is Cat)
            {
            Cat c = o as Cat;
            c.Meow();
            }
            }

            Now I will say that I don’t know much about CLR details, but this should work close enough to HotSpot (Java) that I shouldn’t make any grave mistakes.. that’s the hope.

            32bit code with annotations:
            if (o is Cat)
            # remember the CLR uses fastcall calling convention so the first two arguments are passed in ecx, edx.
            # ecx = o
            # prologue
            01832E18 push esi
            # ecx = o, esi = o
            01832E19 mov esi,ecx
            # ecx = o, esi = o, edx = o
            01832E1B mov edx,esi
            # ecx = Cat class tag
            01832E1D mov ecx,5524D18h
            # call “as” with (CatClass, possibleInstance) as arguments
            01832E22 call 61AD2944
            # return true (!= 0) if the object is of the class, false (0) otherwise
            # ecx and edx are trashed, esi is callee-saved so
            # esi = o which is of class Cat
            01832E27 test eax,eax
            01832E29 je 01832E33
            {
            Cat c = o as Cat;
            # move instance to first argument and call Meow function
            01832E2B mov ecx,esi
            01832E2D call dword ptr ds:[5524D0Ch]
            # epilogue
            01832E33 pop esi
            01832E34 ret

            So clearly we do NOT generate a redundant check when doing as/is and everybody who benchmarked something different, probably wants to go back to the drawing board with their benchmark methodology 🙂

          • @Sandro, if two pieces of code compile to the same machine code (what “optimized away” means) then there will be no alignment issue or any other thing that prevent it from achieving same execution time.

            @DaS, I am not sure what you used to disassemble, with VS2013 now and compiled with “Release” I still got different execution times and the following machine code, I assume the optimization is done starting at VS2015, I used ollydbg to break in the infinite loop and check the disassembly (I added comments to guide those not fluent in x86 assembly):
            ;if (o is string)
            00F000B0 85F6 TEST ESI,ESI
            00F000B2 74 10 JE SHORT 00F000C4
            00F000B4 813E 183E765B CMP DWORD PTR DS:[ESI],5B763E18
            00F000BA B8 00000000 MOV EAX,0
            00F000BF 0F44C6 CMOVE EAX,ESI
            00F000C2 EB 02 JMP SHORT 00F000C6
            00F000C4 8BC6 MOV EAX,ESI
            00F000C6 85C0 TEST EAX,EAX
            00F000C8 74 18 JE SHORT 00F000E2
            ;(o as string).Clone();
            00F000CA 85F6 TEST ESI,ESI
            00F000CC 74 10 JE SHORT 00F000DE
            00F000CE 813E 183E765B CMP DWORD PTR DS:[ESI],5B763E18
            00F000D4 B8 00000000 MOV EAX,0
            00F000D9 0F44C6 CMOVE EAX,ESI
            00F000DC EB 02 JMP SHORT 00F000E0
            00F000DE 8BC6 MOV EAX,ESI
            00F000E0 3800 CMP BYTE PTR DS:[EAX],AL
            ;for (int i = 0; i < 1000000000000; i++)
            00F000E2 43 INC EBX
            00F000E3 8BC3 MOV EAX,EBX
            00F000E5 C1F8 1F SAR EAX,1F
            00F000E8 3D E8000000 CMP EAX,0E8
            00F000ED 7F 0A JG SHORT 00F000F9
            00F000EF ^ 7C BF JL SHORT 00F000B0

          • > @Sandro, if two pieces of code compile to the same machine code (what “optimized away” means) then there will be no alignment issue or any other thing that prevent it from achieving same execution time.

            This is trivially incorrect. Machine code not aligned on 16-bit boundaries executes more slowly [1]. This is only the most basic example of how distinct but identical instruction sequences could have observably different timing behaviour.

            Because “same code implies same observable performance” is false, you then can’t conclude “different performance implies it’s not same code”.

            [1] See setion 3.4.1.5, Code Alignment of Intel’s optimization manuals, where 2 rules are given for optimizing code by aligning it with medium impact on performance, http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

          • @Sandro, a compiler won’t take a piece of code and generate machine code at random alignment, if I compile the same code (or the same optimized intermediary code, for the matter) twice with same settings it will generate the same machine code, at same alignment, etc, if, for any weird reason, the compiler did mess with the alignment the difference wouldn’t be consistent anyway.

          • @EduardoS, You’re making an assumption about compiler internals and its memory management based on no evidence. Are you that familiar with the internals of the .NET 2.0 CLR?

            Regardless, code alignment’s impact on memory fetches is but one possible way two identical instruction sequences could have observable performance differences, as I initially said.

            Furthermore, DaS clearly showed that the test-then-cast is coalesced in his runtime version yielding no performance difference, and my tests showed no observable performance impact on the .NET 2.0 CLR. It’s a little presumptuous to simply claim that benchmarks that show any results different from yours are wrong, simply because your CLR version doesn’t perform the optimization.

            The CLR JIT has changed significantly since those tests were first run, as have CPUs. The fact is, test-then-cast has no consistent or meaningful performance difference to cast-then-test, and the advice to prefer one over the other for performance reasons should just die already.

    • as was mentioned above, this is even optimized away, but even if it wouldn’t be, the odds that a type check is going to have any meaningful real world performance impact is quite low. It’s just not an expensive enough operation to be so concerned about optimizing it out unless this is an operation that you’re expecting to call millions of times a second. Odds are that in any non-trivial program you almost certainly spend your time elsewhere and have a more meaningful impact on the program’s performance.

  2. I don’t get it. Don’t we have casts like this all over LINQ?

    http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs

        public static class Enumerable
        {
            public static IEnumerable Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
                if (source == null) throw Error.ArgumentNull("source");
                if (predicate == null) throw Error.ArgumentNull("predicate");
                if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);
                if (source is TSource[]) return new WhereArrayIterator((TSource[])source, predicate);
                if (source is List<TSource>) return new WhereListIterator((List<TSource>)source, predicate);
                return new WhereEnumerableIterator(source, predicate);
            }
    }
    
    • Looks like the angle brackets disappeared while cutting and pasting. But we all know that Enumerable.Where is generic. Are the rules different for extension methods?

      • Rules are the same, but in this example, you are casting from an interface to a class, not from a type parameter to a class. Normal C# casts.

    • Those aren’t casting anything of type TSource to anything else, and they are not checking what the type argument provided for TSource is. Everything is fully generic in TSource.

      • But we are casting from IEnumerable to TSource[]

        if (source is TSource[]) return new WhereArrayIterator((TSource[])source, predicate);

        Would I be right if I said we are allowed to cast the non generic IEnumerable part of the type, but the generic TSource is off limits?

  3. > This feature — computing additional type information on variables in different portions of the code — has been suggested a number of times for C#, but it is a surprisingly hard feature to get right in the general case. There are a lot of TOCTOU-like problems

    Some languages support it, like Kotlin.
    Why is it so difficult to prove that the variable didn’t change? I suppose that even if the compiler is not very smart about it, that would be enough for the most common cases.

    • Suppose the variable is a field, dereferenced pointer, closed-over local or array element. Any method call between the test and the use could change the value of the variable. Any await or yield return between the test and the use could change the value of the variable. (And if the variable is being modified on another thread then you probably have a race condition anyways.)

      C# unfortunately does not have “final” locals; if it did, then the feature could be restricted to them.

      • Most variables I work with are just plain method parameters, most of which I never affect to. Restricting the feature to them would be good enough.
        Of course, just having final vars would be a nice improvement of C# to begin with. I like Scala’s ‘val’ syntax, it goes well with ‘var’.

        • The fact that you choose not to edit parameters to methods as a convention, even though you’re able to, doesn’t mean that the compiler can take, as an assumption the fact that method parameters are never mutated.

          • Yes, obviously I’m not asking the compiler to trust me, that would be ruining the whole point of compiling.
            But the compiler could seemingly easily prove that a lot of parameters are: not pointers, not in arrays, not closed over, not in an async or yielding method, and not mutated.
            Sure, sometimes it wouldn’t be able to prove non-mutation even when the developer knows it to be true, and then we wouldn’t get the benefits, but if it works in the most common cases then it’s worth it.
            Then again it would be simpler to start by adding ‘val’ to the language (and I think it would fit well into C#).

      • If the code within the conditional block isn’t allowed to use a “yield return” or other such things, such code could, semantically, be moved into a static generic method (passing any necessary variables as “ref” parameters) which could then, if nothing else, be invoked via delegate cached in a static generic class. That wouldn’t work if code would need to “yield return” from within that method, but otherwise I would think it should be possible to get good semantics; I wouldn’t be at all surprised if a compiler could use some trickery to eliminate the need for the delegate dispatch.

  4. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1947

  5. While it’s not universally true, often this situation arises because generic code returns a more derived type as its base type, even though the more derived type is well known. I’ve found a useful technique for avoiding that scenario actually does come from C++ – the Curiously Recurring Template pattern can be applied to generics, and helps to reduce ambiguity about what type is being returned. Not a universal solution, but a useful tool.

  6. Pingback: Dew Drop – October 15, 2015 (#2112) | Morning Dew

  7. Pingback: Apache Cordova Visual Studio Tools - Get the scoop

  8. Pingback: Les liens de la semaine – Édition #154 | French Coding

  9. Pingback: Inferring from “is”, part one | Fabulous adventures in coding

  10. Juust desire to say your article is as astounding.
    The clarity in your post is juwt great and i could assume yoou
    are an expert on this subject. Fiine with your permission let me to grab your feed to keep updated with
    forthcoming post. Thanks a million and plsase keep up the rewarding
    work.

  11. Pingback: Visual Studio – Developer Top Ten for Oct 20th, 2015 - Dmitry Lyalin

  12. One could create perfect semantics for:

    void foo[T](T thing)
    {
    if (thing is IDog d)
    … code using d as an IDog
    else
    … code for non-IDog case
    }
    via:

    private static class fooHelper[T]
    {
    public delegate bool procType(classOfFoo it, ref T d, ref otherVarType otherVar, etc.) =
    findMethod;
    public procType proc;
    public bool dummyProc(…same parameters as delegate) { return false; }
    public bool findMethod(…same parameters as delegate)
    { code that sets proc to outer class foo_helper[T] if possible or else to dummyProc }
    }
    bool foo_helper[T](… same parameters as delegate) where T:IDog
    {
    … code using d as dog
    return true;
    }
    void foo[T](T thing)
    {
    if (!fooHelper[T].proc(this, ref thing, ref otherVar, etc.))
    .. code for non-IDog case
    }

    Note that this approach will correctly and without GC pressure even if T happens to be a structure that implements a mutating IDog interface, and performance is pretty reasonable since Reflection is only needed once for each type T. The only problem is that having to pull the method into its own class is really ugly syntactically.

  13. Have actually been bit by having to use the object then intended type cast to avoid the compiler error. It still aggravates me that I have that code in my library despite it being plenty fast for it’s use simply because it looks (and felt) ugly and broken. Never realized it was actually a thing where it really didn’t make sense beyond that’s how it had to be done.

Leave a reply to John Payson Cancel reply