Why does a foreach loop silently insert an “explicit” conversion?

Posted on July 22, 2013 by ericlippert

The C# specification defines

foreach (V v in x) 
  embedded-statement

as having the same semantics as:

{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}

(Actually this is not exactly what the spec says; I’ve made one small edit because I don’t want to get into the difference between the element type and the loop variable type in this episode.)

There are a lot of subtleties here that we’ve discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that’s an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code. (Or the long being a constant that fits into an int.) What justifies this odd design choice?

The answer is: the foreach loop semantics were designed before generics were added to the language; a highly likely scenario is that the collection being enumerated is an ArrayList or other collection where the element type is unknown to the compiler, but is known to the developer. It is rare for an ArrayList to contain ints and strings and Exceptions and Customers; usually an ArrayList contains elements of uniform type known to the developer. In a world without generics you typically have to know that ahead of time by some means other than the type system telling you. So just as a cast from object to string is a hint to the compiler that the value is really a string, so too is

foreach(string name in myArrayList)

a hint to the compiler that the collection contains strings. You don’t want to force the user to write:

foreach(object obj in myArrayList)
{
  string name = (string)obj;

In a world with generics, where the vast majority of sequences enumerated are now statically typed, this is a misfeature. But it would be a large breaking change to remove it, so we’re stuck with it.

I personally find this feature quite confusing. When I was a beginner C# programmer I mistakenly believed the semantics of the foreach loop to be:

    while (e.MoveNext()) 
    {
      current = e.Current;
      if (!(current is V)) 
        continue;
      v = current as V;
      embedded-statement
    }

That is, the real feature is “assert that every item in the sequence is of type V and crash if it is not”, whereas I believed it was “for every element in this sequence of type V…”. (If the latter is the behavior you actually want, the OfType extension method has those semantics.)

You might wonder why the C# compiler does not produce a warning in modern code, where generics are being used. When I was on the C# compiler team I implemented such a warning and tried it on the corpus of C# code within Microsoft. The number of warnings produced in correct code (where someone had a sequence of Animal but knew via other means that they were all Giraffe) was large. Warnings which fire too often in correct code are bad warnings, so we opted out of adding the feature.

The moral of the story is: sometimes you get stuck with weird legacy misfeatures when you massively change the type system in version two of a language. Try to get your type system right the first time when next you design a new language.

34 thoughts on “Why does a foreach loop silently insert an “explicit” conversion?”

Gabe on July 22, 2013 at 11:02 am said:

I tend to think of the type specified in the foreach as a request for an explicit conversion. In “foreach(String str in myArrayList)”, for example, I am specifying String because I want the compiler to put in a conversion for me. If I didn’t want the conversion, I would just specify “var” as the type, and get whatever type the enumerator returns.

Reply ↓
- Federico on July 22, 2013 at 12:32 pm said:
  
  And now that you mention ‘var’, I’m intrigued on how did the spec change when implicit type inference for locals was added (if it changed at all). Given that you cannot cast to var, I’m guessing type inference is performed before expanding the foreach.
  
  Reply ↓
  - Eric Lippert on July 22, 2013 at 1:00 pm said:
    
    Since the element type is inferred from the type returned by Current, the cast becomes a no-op “identity conversion”.
    
    Reply ↓
Chris B on July 22, 2013 at 11:44 am said:

foreach(object obj in myArrayList)
{
string name = (object)obj; <— I think you mean (string)obj

🙂

Reply ↓
- Eric Lippert on July 22, 2013 at 12:58 pm said:
  
  Whoops. Thanks!
  
  Reply ↓
Mauricio Scheffer on July 22, 2013 at 12:23 pm said:

A few months ago I suggested ReSharper should implement such a warning: http://youtrack.jetbrains.com/issue/RSRP-332263
Sadly, it seems nobody’s interested.
In my humble opinion, this almost tacit downcast *is* bad enough to justify forcing the user to do it explicitly.

Reply ↓
Stilgar on July 22, 2013 at 1:23 pm said:

Does this affect performance or the unnecessary explicit cast can be optimized away by the jitter?

Also you did move the loop variable declaration inside the loop at some point (because of closures), didn’t you?

Reply ↓
- Eric Lippert on July 22, 2013 at 2:09 pm said:
  
  You can determine the performance cost by measuring it. I never have.
  
  And yes, I cut-n-pasted from the C# 4 spec there.
  
  Reply ↓
Pavel Voronin on July 22, 2013 at 2:42 pm said:

Eric, what does compiler produce if we do not specify the type but use var keyword?

Is there any sense in explicit casting if compiler already inferred the type from the collection’s items?

Reply ↓
- Pavel Voronin on July 22, 2013 at 2:44 pm said:
  
  oops, missed you comment above…
  
  Reply ↓
Eugene on July 22, 2013 at 8:21 pm said:

Maybe a warning when the cast happens to be a conversion that is not allowed implicitly (like long to int), and no warning when it’s a cast?

Reply ↓
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1404
Falanwe on July 23, 2013 at 3:23 am said:

Does that mean that if you use a foreach loop on a generic struct collection, you get the (tiny, but existing) performance cost of a boxing as e.Current is an object before being cast back to a struct?

Reply ↓
- Eric Lippert on July 23, 2013 at 9:06 am said:
  
  No; the whole point of this scheme is to avoid that cost if possible.
  
  Reply ↓
  - Falanwe on July 25, 2013 at 8:48 am said:
    
    So I guess in this context we have
    * C is IEnumerable of T
    * E is IEnumerator of T
    
    to prevent boxing. am I right?
    
    Reply ↓
    - Eric Lippert on July 25, 2013 at 10:28 am said:
      
      Correct; however, even in C# 1.0 it was possible to avoid boxing if you were clever. That’s a subject for another day.
      
      Reply ↓
      - Random832 on August 9, 2013 at 11:43 am said:
        
        I know the answer to how (I won’t spoil it for others), but I’m curious – do Arrays in C# 1.0 do this?
Sam on July 23, 2013 at 5:52 am said:

I disagree that this is now a misfeature. The .NET framework and friends (such as the SharePoint API) have numerous collections which do not implement IEnumerable<T> and for which enumeration using var results in type Object being inferred. These are not deprecated APIs either (MatchCollection anyone?). Directly specifying the type remains an elegant way to write the foreach loop without a lot of additional cruft.

Reply ↓
- Brian on July 23, 2013 at 6:27 am said:
  
  Personally, I consider it a problem that all these old collection classes are not generic. It’s a bit late to be worth fixing it now (and when they were initially implemented, generics did not exist), but I regret the existence of so many non-generic collection classes.
  
  It would make me happy if a new iteration of these classes implemented the generic IENumerable interface, but I’m sure the costs of this change are much higher than the rather minimal benefits.
  
  Reply ↓
- Falanwe on July 23, 2013 at 7:56 am said:
  
  Even for dealing with those legacy collections, using the Cast or OfType extension methods (depending on the desired semantics) may be better, as it makes crystal clear what you are actually doing.
  It’s a good idea never to hide a cast behind a somewhat obscure language feature, even more so a feature a lot of people misinterpret at first. (I assumed that foreach (int i in myCollection) was the same as foreach (var i in myCollection.OfType() before reading this article. It appears it is more (var i in myCollection.Cast()
  
  Reply ↓
- John Payson on July 23, 2013 at 9:43 am said:
  
  What I would like to see for this and many other similar situations (e.g. invoking a method on a read-only struct, or trying to pass a property as a `ref` parameter) would be attributes which specify in what cases certain compiler inferences or substitutions would be semantically correct and helpful, and in which cases they should generate warnings or errors. For example, in the absence of an attribute or explicit rule, a compiler might helpfully permit the above-described inference (without a warning) only if `Current` returns a non-generic `Object`, but in the presence of an attribute or explicit rule would allow or forbid the type inference. That would seem to offer the best of all worlds.
  
  Reply ↓
Sam on July 23, 2013 at 8:38 am said:

I’m not so sure this language feature is all that obscure. The behavior of foreach is one of the very first things you learn when studying a language. The added ugliness and (admittedly miniscule) overhead of calling .Cast() is not worth it to me when its already baked right there into foreach.

Reply ↓
tobi on July 23, 2013 at 9:05 am said:

If this is a misfeature why was the same feature added to LINQ?

from T x in enumerable

Very handy for .NET legacy collections (MatchCollection, …), but was that the full justification?

Reply ↓
- Eric Lippert on July 23, 2013 at 9:08 am said:
  
  Yes, that’s the justification. Also note that the feature does not introduce a cast, in introduces a call to the Cast extension method, which of course you can implement to do whatever you want.
  
  Reply ↓
  - Falanwe on July 26, 2013 at 6:19 am said:
    
    Now you got me curious… 😉
    
    Reply ↓
TD on July 23, 2013 at 11:53 am said:

Is there any relation with implementation of interfaces as well?
Consider:
public interface IFoo
{
void Bar();
}

public class MyFoo : IFoo
{
public void Bar()
{
// do something
}
void IFoo.Bar()
{
// do something completely different
}
}

Is it correct that the explicit cast is always going to make MyFoo::IFoo.Bar() be called instead of MyFoo::Bar()?

Reply ↓
- Eric Lippert on July 23, 2013 at 1:29 pm said:
  
  Well it would be awfully strange if you got a different method called out of foreach(IFoo ifoo in foos) ifoo.Bar(); and IFoo ifoo = foos[0]; ifoo.Bar();.
  
  Reply ↓
Rob Siklos on July 23, 2013 at 1:16 pm said:

Now that we have Cast() and OfType(), I think it makes a lot of sense to warn in this situation.

However, that statement only applies to new code, as there is obviously a lot of legacy code that relies on the magic.

I think the best solution would be to add a Code Analysis rule for this. That way, on older projects the rule could easily be disabled, but for new projects, the rule would be on by default and devs would be discouraged from relying on the now-mis-feature.

Reply ↓
Justin on July 23, 2013 at 5:51 pm said:

Nice post, but I am confused on one part:

E e = ((C)(x)).GetEnumerator();

What is the C?

Reply ↓
- Eric Lippert on July 24, 2013 at 8:44 am said:
  
  That’s for some subtle cases that I deliberately did not mention. Suppose you are in the incredibly unlikely and foolish situation of having a collection class F : IEnumerable { public int GetEnumerator; IEnumerator IEnumerable.GetEnumerator() { ... }. What should happen when you do a foreach(string s in f) on this thing? Surely not f.GetEnumerator(), because f.GetEnumerator is an int! The right method can only be accessed by converting to the interface type. Thus the compiler generates ((IEnumerable)f).GetEnumerator(). The C is just a stand-in for “whatever type was determined to be the type necessary to get the right GetEnumerator“. This is all explained in the specification; see it for more details.
  
  Reply ↓
  - Random832 on August 9, 2013 at 11:56 am said:
    
    Why does this work if for class F : IEnumerable { as you stated above}, but not for interface I { IEnumerator GetEnumerator(); } and class C : I? Is IEnumerable a special case? If so, why say “the type necessary” as though it could be any type other than IEnumerable or the declared type of x?
    
    Reply ↓
Harald van Dijk on July 24, 2013 at 4:50 am said:

“The collection could be a list of longs and V could be int”: I never realised that was possible before. For that to work, the enumerator’s Current property must have type long. If it has type object, say because you’re dealing with an ArrayList, even if the collection only contains longs, the conversion to int will cause an exception to be thrown. Exactly like an explicit cast would. In that case, you could do something like foreach (int i in a.Cast<long>()) { … }

Reply ↓
Jaap on July 25, 2013 at 1:51 pm said:

Why did you write:

v = current as V;

after you’ve checked that current is of type V (the continue)?

Doesn’t the “as” operator also do a type check? I would think

if (!(current is V)) continue;
v = (V)current;

would be better.

Reply ↓
- Mikant on April 18, 2014 at 8:30 am said:
  
  http://blogs.msdn.com/b/ericlippert/archive/2010/09/16/is-is-as-or-is-as-is.aspx
  
  Reply ↓