Odious ambiguous overloads, part two

There were a number of ideas in the comments for what we should do about the unfortunate situation I described yesterday. We considered all of these options and more.  I thought I might talk a bit about the pros and cons of each.

I suggested do nothing.  We shipped with this behaviour in C# 2.0, the world didn’t come to an end, we can keep on shipping with it. The pro side is that this is the cheapest and easiest thing to do.  The con side is that as a reader pointed out, this is a gross “gotcha” just waiting for someone to stumble upon it.

DrPizza suggested that we come up with a syntax that disambiguates between the two ambiguous cases.  On the pro side, the problem would be clearly solved.  On the con side, this might be a breaking change (because suddenly one of the methods that used to bind one way might start binding the other way, depending on what the choice of syntax was.)

Also on the con side, adding new language syntax is something that we do not do lightly.  There is considerable work involved in changing the specification, and this is kind of an edge case.

Ayende Rahien suggested that we add an attribute that hints to the compiler which is the right binding. This has the nice property of not introducing new language syntax, but is kind of ugly.

These are both good ideas but unfortunately, as it turns out, neither of these work for technical reasons which I will go into in detail below.

Chris D suggested that the constructed interface should be treated as an interface with only one method. That’s a really nice idea, but unfortunately it doesn’t work with the CLR definition of generics.  An interface with n methods has n vtable slots that need filling, period, and n is invariant under construction. We’d have to change both the semantics of constructed interfaces and their implementation in the CLR, which means changing both the C# spec and the CLR spec, and that’s way too much work for this edge case.

Kyle Lahnakoski suggested that it simply be an error.  There are several ways this could be an error.

It could be an error to have an unconstructed generic type which could possibly be constructed to produce a duplicated signature.  This was how generics originally worked in C#, but we decided that we could allow these cases and we can’t really go back on that decision now – that would be a huge breaking change. There are many serializable generic objects which use a pattern of having a method that takes a type parameter to overload a method that takes a serialization object parameter, and all of those would suddenly be errors.

It could be an error to construct a generic type so as to actually produce a duplicate signature, but again, we decided to allow that in C# 2.0 and it would be a large breaking change to go back now.

It could be an error to have any explicit implementation of an interface method which matches ambiguously.  This would be a smaller breaking change, but still breaking.

What do any of these breaking changes buy the user though?  Not more functionality, but less.  Basically we’d be saying “you could do this before, and now you can’t, too bad.” That’s a difficult sale to make.

Carlos suggested unification – that is, have an explicit implementation map to every method that it matches, not just the first one.  This is a breaking change, but it seems like the most natural thing to do.  Since an implicit implementation maps to every interface slot that it matches, why shouldn’t an explicit implementation do the same? This means adding new language to the spec, but no new syntax.

This is what we decided to do, and I tried to implement it, only to discover that, uh, actually that stuff I told you yesterday about the implicit implementation matching the first one in source code order is a bit of a mistruth. That’s what I thought when I first looked at the problem, but upon closer examination, something deeper was wrong.

In fact what is going on here is that when the CLR loads the class for the first time, it does a verification check to ensure that the class does in fact implement every method of every interface that it says it does.  It is the CLR, not C#, which matches the explicit implementation to the first matching slot in the constructed interface type! It just so happens that since metadata for the interface is emitted in source-code order, that the CLR appears to do the match in source code order.  It actually does it in a CLR-implementation-defined order that just happens to be source code order.

To understand why we can’t fix this, you need to understand how explicit implementations are represented in metadata.  In the CLR metadata for a class there is a table called the MethodImpl table which says “method blah of class foo explicitly implements method bar of interface IFred”.  The “method bar of interface IFred” portion of that mapping may be either a token to a method of the unconstructed interface (a “MethodDef”) , or it may be a token representing a method on the constructed interface (a “MemberRef”).

We can’t use a MethodDef because there is a signature mismatch between the class’s implementation and the method on the unconstructed interface.  One takes an int, the other takes a type parameter, and the CLR flags that as illegal. So that’s out.

But we can’t use the latter because the MemberRef identifies the constructed interface method by – you guessed it – its signature!  And now we’re in the same hideous bind all over again.  We have no way to tell the CLR “no, really, use the generic-substituted version of the method” , because it can’t tell the two signatures apart either.

To fix this problem by either unifying, or adding a syntax/attribute to disambiguate, we need some way to represent that unification/disambiguation in metadata. But the CLR simply lacks the concept of “method token for constructed method identified by something more unique than signature”.

Though we can certainly suggest that this is a weakness in the CLR metadata design, waiting around for that to be addressed to fix this edge-case problem seems like a non-starter. Given that we can’t fix it in the current CLR the way we’d like, and making it an error seems like an egregious overreaction that buys the user nothing, but at the same time doing nothing is leaving a gotcha in the code waiting to happen, I took the only course of action left:

Warning: Explicit interface implementation 'I1<int>.M(int)' matches 
more than one interface member. Which interface member is actually 
chosen is implementation-dependent. Consider using a non-explicit 
implementation instead."

Unfortunately, sometimes that’s the best you can do.


Update from 2020: C# language designer Neal Gafter informs me that the bug in the CLR metadata processing which led to this situation was fixed in .NET 5.0, which is at the time of this writing in preview release. See his comment below for details.

Microsoft understandably gets some flak when it takes fifteen years for a bug to go from reported to fixed, but as always, you’ve got to consider the opportunity costs. There is a finite amount of effort available to fix bugs, and fixing bugs often introduces new problems in the form of subtle compatibility issues, so you’ve got to ruthlessly prioritize. This is a rare scenario for realistic code, and when it does arise in realistic code it indicates that there’s an opportunity to refactor that code into something less ambiguous.

Odious ambiguous overloads, part one

As you might have gathered, a lot of the decisions we have to make day-to-day here involve potential breaking changes on odd edge cases. Here’s another one.

Consider the following terrible but legal code:

public interface I1<U> {
  void M(U i);
  void M(int i);
}

My intense pain begins when the user writes:

public class C1 : I1<int> 
{

In the early version of the C# specification it was actually illegal to have an ambiguity like this, but the spec was changed so that when doing overload resolution on a call to M we can choose one, according to section 14.4.2.2:

If one [argument type] is non-generic, but the other is generic, then the non-generic is better.

But still, as you will soon see, we’re in a world of hurt for another reason, namely that this class must now implement both M(int i) and, uh, M(int i). Fortunately, a nonexplicit implementation binds to both methods, so this works just fine:

public class C1 : I1<int> 
{
  public void M(int i)
  {
    Console.WriteLine("class " + i);
  }
}

The method implements both versions of M and the contract is satisfied.  But we have problems if we try to do an explicit interface implementation:

public class C2 : I1<int> 
{
  void I1<int>.M(int i) 
  {
    Console.WriteLine("explicit " + i);
  }
}

Does this explicitly implement both members of I1?  Or just one?  If so, which one?

In the current compiler this code produces a terrible, terrible error:

error CS0535: 'C2' does not implement interface member 'I1<int>.M(int)'

Is that so?  It sure looks like it implements it!

What happens when we have both an explicit implementation and a class implementation?  The spec does not actually say what to do. It turns out that we end up in a situation where runtime behaviour depends on source code order of the interface! Check this out:

public interface I1<U> 
{
  void M(U i); // generic first
  void M(int i);
}
public interface I2<U> 
{
  void M(int i);
  void M(U i); // generic second
}
public class C3: I1<int>, I2<int> 
{
  void I1<int>.M(int i) 
  {
    Console.WriteLine("c3 explicit I1 " + i);
  }
  void I2<int>.M(int i) 
  {
    Console.WriteLine("c3 explicit I2 " + i);
  }
  public void M(int i) 
  {
    Console.WriteLine("c3 class " + i);
  }
}
class Test 
{
  static void Main() 
  {
    C3 c3 = new C3();
    I1<int> i1_c3 = c3;
    I2<int> i2_c3 = c3;
    i1_c3.M(101);
    i2_c3.M(102);
  }
}

What happens here is that the explicit interface implementation mappings in the class match the methods in the interfaces in a first-come-first-served manner:

  • void I1<int>.M(U) maps to explicit implementation void I1<int>.M(int i)
  • void I1<int>.M(int) maps to implicit implementation public void M(int i)
  • void I2<int>.M(int) maps to explicit implementation void I2<int>.M(int i)
  • void I2<int>.M(U) maps to implicit implementation public void M(int i)

Then (because of the aforementioned section 14.4.2.2) when we see

i1_c3.M(101);
i2_c3.M(102);

we prefer the typed-as-int versions to the generic substitution versions, so this program calls the two non-generic versions and produces the output:

c3 class 101
c3 explicit I2 102

And as you’d expect, if we force the compiler to pick the generic versions then we get similar behaviour:

static void Main() 
{
  C3 c3 = new C3();
  Thunk1<int>(c3,103);
  Thunk2<int>(c3, 104);
}
static void Thunk1<U>(I1<U> i1, U u) 
{
  i1.M(u);
}
static void Thunk2<U>(I2<U> i2, U u) 
{
  i2.M(u);
}

The binding of the overload resolution in the thunk bodies happens before the substitution of the type parameters, so these always bind to the generic versions of the methods. As you would expect from the mappings above, this outputs

c3 explicit I1 103
c3 class 104

Again, this shows that source code order has an unfortunate semantic import.

Given this unfortunate situation — no spec guidance and an existing implementation that behaves strangely — what would you do? (Of course “do nothing” is an option.) I’m interested to hear your ideas, and I’ll describe what we actually did next time.


Notes from 2020:

This article produced a large number of interesting comments, which I summarize in the next episode.