Why does a foreach loop silently insert an “explicit” conversion?

The C# specification defines

foreach (V v in x) 
  embedded-statement

as having the same semantics as:

{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}

(Actually this is not exactly what the spec says; I’ve made one small edit because I don’t want to get into the difference between the element type and the loop variable type in this episode.)

There are a lot of subtleties here that we’ve discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that’s an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code. (Or the long being a constant that fits into an int.) What justifies this odd design choice?

The answer is: the foreach loop semantics were designed before generics were added to the language; a highly likely scenario is that the collection being enumerated is an ArrayList or other collection where the element type is unknown to the compiler, but is known to the developer. It is rare for an ArrayList to contain ints and strings and Exceptions and Customers; usually an ArrayList contains elements of uniform type known to the developer. In a world without generics you typically have to know that ahead of time by some means other than the type system telling you. So just as a cast from object to string is a hint to the compiler that the value is really a string, so too is

foreach(string name in myArrayList)

a hint to the compiler that the collection contains strings. You don’t want to force the user to write:

foreach(object obj in myArrayList)
{
  string name = (string)obj;

In a world with generics, where the vast majority of sequences enumerated are now statically typed, this is a misfeature. But it would be a large breaking change to remove it, so we’re stuck with it.

I personally find this feature quite confusing. When I was a beginner C# programmer I mistakenly believed the semantics of the foreach loop to be:

    while (e.MoveNext()) 
    {
      current = e.Current;
      if (!(current is V)) 
        continue;
      v = current as V;
      embedded-statement
    }

That is, the real feature is “assert that every item in the sequence is of type V and crash if it is not”, whereas I believed it was “for every element in this sequence of type V…”. (If the latter is the behavior you actually want, the OfType extension method has those semantics.)

You might wonder why the C# compiler does not produce a warning in modern code, where generics are being used. When I was on the C# compiler team I implemented such a warning and tried it on the corpus of C# code within Microsoft. The number of warnings produced in correct code (where someone had a sequence of Animal but knew via other means that they were all Giraffe) was large. Warnings which fire too often in correct code are bad warnings, so we opted out of adding the feature.

The moral of the story is: sometimes you get stuck with weird legacy misfeatures when you massively change the type system in version two of a language. Try to get your type system right the first time when next you design a new language.

34 thoughts on “Why does a foreach loop silently insert an “explicit” conversion?

  1. I tend to think of the type specified in the foreach as a request for an explicit conversion. In “foreach(String str in myArrayList)”, for example, I am specifying String because I want the compiler to put in a conversion for me. If I didn’t want the conversion, I would just specify “var” as the type, and get whatever type the enumerator returns.

    • And now that you mention ‘var’, I’m intrigued on how did the spec change when implicit type inference for locals was added (if it changed at all). Given that you cannot cast to var, I’m guessing type inference is performed before expanding the foreach.

  2. Does this affect performance or the unnecessary explicit cast can be optimized away by the jitter?

    Also you did move the loop variable declaration inside the loop at some point (because of closures), didn’t you?

  3. Eric, what does compiler produce if we do not specify the type but use var keyword?

    Is there any sense in explicit casting if compiler already inferred the type from the collection’s items?

  4. Maybe a warning when the cast happens to be a conversion that is not allowed implicitly (like long to int), and no warning when it’s a cast?

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1404

  6. Does that mean that if you use a foreach loop on a generic struct collection, you get the (tiny, but existing) performance cost of a boxing as e.Current is an object before being cast back to a struct?

  7. I disagree that this is now a misfeature. The .NET framework and friends (such as the SharePoint API) have numerous collections which do not implement IEnumerable<T> and for which enumeration using var results in type Object being inferred. These are not deprecated APIs either (MatchCollection anyone?). Directly specifying the type remains an elegant way to write the foreach loop without a lot of additional cruft.

    • Personally, I consider it a problem that all these old collection classes are not generic. It’s a bit late to be worth fixing it now (and when they were initially implemented, generics did not exist), but I regret the existence of so many non-generic collection classes.

      It would make me happy if a new iteration of these classes implemented the generic IENumerable interface, but I’m sure the costs of this change are much higher than the rather minimal benefits.

    • Even for dealing with those legacy collections, using the Cast or OfType extension methods (depending on the desired semantics) may be better, as it makes crystal clear what you are actually doing.
      It’s a good idea never to hide a cast behind a somewhat obscure language feature, even more so a feature a lot of people misinterpret at first. (I assumed that foreach (int i in myCollection) was the same as foreach (var i in myCollection.OfType() before reading this article. It appears it is more (var i in myCollection.Cast()

    • What I would like to see for this and many other similar situations (e.g. invoking a method on a read-only struct, or trying to pass a property as a `ref` parameter) would be attributes which specify in what cases certain compiler inferences or substitutions would be semantically correct and helpful, and in which cases they should generate warnings or errors. For example, in the absence of an attribute or explicit rule, a compiler might helpfully permit the above-described inference (without a warning) only if `Current` returns a non-generic `Object`, but in the presence of an attribute or explicit rule would allow or forbid the type inference. That would seem to offer the best of all worlds.

  8. I’m not so sure this language feature is all that obscure. The behavior of foreach is one of the very first things you learn when studying a language. The added ugliness and (admittedly miniscule) overhead of calling .Cast() is not worth it to me when its already baked right there into foreach.

  9. If this is a misfeature why was the same feature added to LINQ?

    from T x in enumerable

    Very handy for .NET legacy collections (MatchCollection, …), but was that the full justification?

  10. Is there any relation with implementation of interfaces as well?
    Consider:
    public interface IFoo
    {
    void Bar();
    }

    public class MyFoo : IFoo
    {
    public void Bar()
    {
    // do something
    }
    void IFoo.Bar()
    {
    // do something completely different
    }
    }

    Is it correct that the explicit cast is always going to make MyFoo::IFoo.Bar() be called instead of MyFoo::Bar()?

  11. Now that we have Cast() and OfType(), I think it makes a lot of sense to warn in this situation.

    However, that statement only applies to new code, as there is obviously a lot of legacy code that relies on the magic.

    I think the best solution would be to add a Code Analysis rule for this. That way, on older projects the rule could easily be disabled, but for new projects, the rule would be on by default and devs would be discouraged from relying on the now-mis-feature.

    • That’s for some subtle cases that I deliberately did not mention. Suppose you are in the incredibly unlikely and foolish situation of having a collection class F : IEnumerable { public int GetEnumerator; IEnumerator IEnumerable.GetEnumerator() { ... }. What should happen when you do a foreach(string s in f) on this thing? Surely not f.GetEnumerator(), because f.GetEnumerator is an int! The right method can only be accessed by converting to the interface type. Thus the compiler generates ((IEnumerable)f).GetEnumerator(). The C is just a stand-in for “whatever type was determined to be the type necessary to get the right GetEnumerator“. This is all explained in the specification; see it for more details.

      • Why does this work if for class F : IEnumerable { as you stated above}, but not for interface I { IEnumerator GetEnumerator(); } and class C : I? Is IEnumerable a special case? If so, why say “the type necessary” as though it could be any type other than IEnumerable or the declared type of x?

  12. “The collection could be a list of longs and V could be int”: I never realised that was possible before. For that to work, the enumerator’s Current property must have type long. If it has type object, say because you’re dealing with an ArrayList, even if the collection only contains longs, the conversion to int will cause an exception to be thrown. Exactly like an explicit cast would. In that case, you could do something like foreach (int i in a.Cast<long>()) { … }

  13. Why did you write:

    v = current as V;

    after you’ve checked that current is of type V (the continue)?

    Doesn’t the “as” operator also do a type check? I would think

    if (!(current is V)) continue;
    v = (V)current;

    would be better.

Leave a comment