Optional argument corner cases, part three

A lot of people seem to think that this:

void M(string x, bool y = false) 
{ 
  ... whatever ... 
}

is actually a syntactic sugar for the way you used to have to write this in C#, which is:

void M(string x) 
{ 
  M(x, false); 
} 
void M(string x, bool y) 
{ 
  ... whatever ... 
}

But it is not. Continue reading

Optional argument corner cases, part two

Last time we saw that the declared optional arguments of an interface method need not be optional arguments of an implementing class method. That seems potentially confusing; why not require that an implementing method on a class exactly repeat the optional arguments of the declaration?

Because the cure is worse than the disease, that’s why.

Continue reading

Optional argument corner cases, part one

In C# 4.0 we added “optional arguments”; that is, you can state in the declaration of a method’s parameter that if certain arguments are omitted, then constants can be substituted for them:

void M(int x = 123, int y = 456) { }

can be called as M(), M(0) and M(0, 1). The first two cases are treated as though you’d said M(123, 456) and M(0, 456) respectively.

This was a controversial feature for the design team, which had resisted adding this feature for almost ten years despite numerous requests for it and the example of similar features in languages like C++ and Visual Basic. Though obviously convenient, the convenience comes at a pretty high price of bizarre corner cases that the compiler team definitely needs to deal with, and customers occasionally run into by accident. I thought I might talk a bit about a few of those bizarre corner cases.

Continue reading

Implementing the virtual method pattern in C#, part one

If you’ve been in this business for any length of time you’ve undoubtedly seen some of the vast literature on “design patterns” — you know, those standard solutions to common problems with names like “factory” and “observer” and “singleton” and “iterator” and “composite” and “adaptor” and “decorator” and… and so on. It is frequently useful to be able to take advantage of the analysis and design skills of others who have already given considerable thought to codifying patterns that solve common problems. However, I think it is valuable to realize that everything in high-level programming is a design pattern. Some of those patterns are so good, we’ve baked them right into the language so thoroughly that most of us don’t even think of them as examples patterns anymore, patterns like “type” and “function” and “local variable” and “call stack” and “inheritance”.

I was asked recently how virtual methods work “behind the scenes”: how does the CLR know at runtime which derived class method to call when a virtual method is invoked on a variable typed as the base class? Clearly it must have something upon which to make a decision, but how does it do so efficiently? I thought I might explore that question by considering how you might implement the “virtual and instance method pattern” in a language which did not have virtual or instance methods. So, for the rest of this series I am banishing virtual and instance methods from C#. I’m leaving delegates in, but delegates can only be to static methods. Our goal is to take a program written in regular C# and see how it can be transformed into C#-without-instance-methods. Along the way we’ll get some insights into how virtual methods really work behind the scenes.
Continue reading

Never say never, part two

This is part two of a two-part series about determining whether the endpoint of a method is never reachable. Part one is here. A follow-up article is here.


Whether we have a “never” return type or not, we need to be able to determine when the end point of a method is unreachable for error reporting in methods that have non-void return type. The compiler is pretty clever about working that out; it can handle situations like

int M()
{
  try
  {
    while(true) N();
  }
  catch(Exception ex)
  {
    throw new WrappingException(ex);
  }
}

The compiler knows that N either throws or it doesn’t, and that if it doesn’t, then the try block never exits, and if it does, then either the construction of the exception throws, or the construction succeeds and the catch throws the new exception. No matter what, the end point of M is never reached.

However, the compiler is not infinitely clever. It is easy to fool it:
Continue reading

Strange but legal

“Can a property or method really be marked as both abstract and override?” one of my coworkers just asked me. My initial gut response was “of course not!” but as it turns out, the Roslyn codebase itself has a property getter marked as both abstract and override. (Which is why they were asking in the first place.)

I thought about it a bit more and reconsidered. This pattern is quite rare, but it is perfectly legal and even sensible. The way it came about in our codebase is that we have a large, very complex type hierarchy used to represent many different concepts in the compiler. Let’s call it “Thingy”:

abstract class Thingy 
{
  public virtual string Name { get { return ""; } } 
  ...

There are going to be a lot of subtypes of Thingy, and almost all of them will have an empty string for their name. Or null, or whatever; the point is not what exactly the value is, but rather that there is a sensible default name for almost everything in this enormous type hierarchy.

However, there is another abstract kind of Thingy, a FrobThingy, which always has a non-empty name. In order to prevent derived classes of FrobThingy from accidentally using the default implementation from the base class, we said:

abstract class FrobThingy : Thingy 
{   
  public abstract override string Name { get; } }
  ...

Now if you make a derived class BigFrobThingy, you know that you have to provide an implementation of Name for it because it will not compile if you don’t.

Why are anonymous types generic?

Suppose you use an anonymous type in C#:

var x = new { A = "hello", B = 123.456 };

Ever taken a look at what code is generated for that thing? If you crack open the assembly with ILDASM or some other tool, you’ll see this mess in the top-level type definitions

.class '<>f__AnonymousType0`2'<'<A>j__TPar','<B>j__TPar'>

What the heck? Let’s clean that up a bit. We’ve mangled the names so that you are guaranteed that you cannot possibly accidentally use this thing “as is” from C#. Turning the mangled names back into regular names, and giving you the declaration and some of the body of the class in C#, that would look like:

[CompilerGenerated]
internal sealed class Anon0<TA, TB>
{
  private readonly TA a;
  private readonly TB b;
  public TA A { get { return this.a; } }
  public TB B { get { return this.b; } }
  public Anon0(TA a, TB b)
  { this.a = a; this.b = b; }
  // plus implementations of Equals, GetHashCode and ToString
}

And then at the usage site, that is compiled as:

var x = new Anon0<string, double>("hello", 123.456);

Again, what the heck? Why isn’t this generated as something perfectly straightforward, like:

[CompilerGenerated]
internal sealed class Anon0
{
  private readonly string a;
  private readonly double b;
  public string A { get { return this.a; } }
  public double B { get { return this.b; } }
  public Anon0(string a, double b)
  { this.a = a; this.b = b; }
  // plus implementations of Equals, GetHashCode and ToString
}

Good question. Consider the following. Suppose you have a library assembly, not written by you, that contains the following types:

public class B
{
  protected class P {}
}

Now, in your source code you have:

class D1 : B
{
  void M() 
  { 
    var x = new { P = new B.P() }; 
  }
}

class D2 : B
{
  void M() 
  { 
    var x = new { P = new B.P() }; 
  }
}

We need to generate an anonymous type, or types, somewhere. Suppose we decide that we want the two anonymous types – which have the same types and the same property names – to unify into one type. (We desire anonymous types that are structurally identical to unify within an assembly because that enables scenarios where multiple methods use generic type inference to infer the same anonymous type; you want to be able to pass instances of that anonymous type around between such methods. Perhaps I’ll do an example of that in the new year.)

Where do we generate that type? How about inside D1:

class D1 : B
{
  [CompilerGenerated]
  ??? sealed class Anon0 { public P P { get { ... } } ... }
  void M() 
  { 
    var x = new { P = new B.P() }; 
  }
}

What is the desired accessibilty of Anon0 in the location marked with question marks? It cannot be private or protected, because then D2 cannot see it. It cannot be either public or internal, because then you’d have a public/internal type with a public property that exposes a protected type, which is illegal. Nor can it be either “protected and internal” or “protected or internal” by similar logic. It cannot have any accessibility! Therefore the anonymous type cannot go in D1.  Obviously by identical logic it cannot go in D2. It cannot go in B because you don’t have the ability to modify the assembly containing class B. The only remaining place it can go is in the global namespace. But at the top level an internal type cannot refer to P, a protected nested type. P is only accessible inside a derived class of B but we’ve already ruled all those out.

But we can put the anonymous type at the top level if it never actually refers to P. If we make generic class Anon0<TP> and construct it with P for TP, then P only ever appears inside D1 and D2, and yet the types unify as desired.

Rather than coming up with some weird heuristic that determined when anonymous types needed to be generic, and making them normally typed otherwise, we simply decided to embrace the general solution. Anonymous types are always generated as generic types even when doing so is not strictly necessary. We did extensive performance testing to ensure that this choice did not adversely impact realistic scenarios, and as it turned out, the CLR is really quite buff when it comes to construction of generic types with lots of type parameters.

And with that, I’m off for the rest of the year. Air travel is too expensive this year, so I’m going to miss my traditional family Boxing Day celebration, but I’m sure it’ll be delightful to spend some time in Seattle for the holidays. I hope you all have a safe and festive holiday season, and we’ll see you for more fabulous adventures in 2011.

Continuing to an outer loop

When you have a nested loop, sometimes you want to “continue” the outer loop, not the inner loop. For example, here we have a sequence of criteria and a sequence of items, and we wish to determine if there is any item which matches every criterion:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      // No point in checking anything further; this is not
      // a match. We want to “continue” the outer loop. How?
    }
  }
  match = item;
  break;
}

There are many ways to achieve this. Here are a few, in order of increasing awesomeness:

Technique #1 (ugly): When you have a loop, the continue statement is essentially a goto which branches to the bottom of the loop, just before it does the loop condition test and pops back up to the top of the loop. (Apparently when you spell goto as continue, the “all gotos are all evil all the time” crowd doesn’t bother you as much.) You can of course simply make this explicit:

match = null;
foreach(var item in items)
  {
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      goto OUTERCONTINUE;
    }
  }
  match = item;
  break;
  OUTERCONTINUE:
  ;
}

Technique #2 (better): When I see a nested loop, I almost always consider refactoring the interior loop into a method of its own.

match = null;
foreach(var item in items)
{
  if (MeetsAllCriteria(item, critiera))
  {
    match = item;
    break;
  }
}

where the body of MeetsAllCriteria is obviously

foreach(var criterion in criteria)
{
  if (!criterion.IsMetBy(item))
  {
    return false;
  }
}
return true;

Technique #3 (awesome): The “mechanism” code here is emphasizing how the code works and hiding the meaning of the code. The purpose of the code is to answer the question “what is the first item, if any, that meets all the criteria?” If that’s the meaning of the code, then write code that reads more like that meaning:

var matches = from item in items
              where criteria.All(criterion=>criterion.IsMetBy(item))
              select item;
match = matches.FirstOrDefault();

That is, search the items for items that meet every criterion, and take the first of them, if there is one. The code now reads like what it is supposed to be implementing, and of course, if there are no loops in the first place then there’s no need to “continue” at all!

The thing I love most about LINQ is that it enables a whole new way to think about problem solving in C#. It’s gotten to the point now that whenever I write a loop, I stop and think about two things before I actually type “for”:

  • Have I described anywhere what this code is doing semantically, or does the code emphasize more what mechanisms I am using at the expense of obscuring the semantics?
  • Is there any way to characterize the solution to my problem as the result of a query against a data source, rather than as the result of a set of steps to be followed procedurally?

Closing over the loop variable considered harmful, part two

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The for loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week’s post.

More countries really ought to implement Instant-Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form “I’d prefer to make the change, but if you can’t do that then make it a warning”. Or “don’t make the change, do make it a warning”, and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

  • Commenters who expressed support for a warning: 26
  • Commenters who expressed the sentiment “it’s better to not make the change”: 24
  • Commenters who expressed the sentiment “it’s better to make the change”: 25

Wow. I guess we’ll flip a coin. :-)[1. Mr. Smiley Face indicates that Eric is indulging in humourous japery.]

Continue reading