Implementing the virtual method pattern in C#, part one

If you’ve been in this business for any length of time you’ve undoubtedly seen some of the vast literature on “design patterns” — you know, those standard solutions to common problems with names like “factory” and “observer” and “singleton” and “iterator” and “composite” and “adaptor” and “decorator” and… and so on. It is frequently useful to be able to take advantage of the analysis and design skills of others who have already given considerable thought to codifying patterns that solve common problems. However, I think it is valuable to realize that everything in high-level programming is a design pattern. Some of those patterns are so good, we’ve baked them right into the language so thoroughly that most of us don’t even think of them as examples patterns anymore, patterns like “type” and “function” and “local variable” and “call stack” and “inheritance”.

I was asked recently how virtual methods work “behind the scenes”: how does the CLR know at runtime which derived class method to call when a virtual method is invoked on a variable typed as the base class? Clearly it must have something upon which to make a decision, but how does it do so efficiently? I thought I might explore that question by considering how you might implement the “virtual and instance method pattern” in a language which did not have virtual or instance methods. So, for the rest of this series I am banishing virtual and instance methods from C#. I’m leaving delegates in, but delegates can only be to static methods. Our goal is to take a program written in regular C# and see how it can be transformed into C#-without-instance-methods. Along the way we’ll get some insights into how virtual methods really work behind the scenes.
Continue reading

To box or not to box

Suppose you have an immutable value type that is also disposable. Perhaps it represents some sort of handle.

struct MyHandle : IDisposable
{
  public MyHandle(int handle) : this() { this.Handle = handle; }
  public int Handle { get; private set; }
  public void Dispose() 
  {
    Somehow.Close(this.Handle);
  }
}

You might think hey, you know, I’ll decrease my probability of closing the same handle twice by making the struct mutate itself on disposal!

public void Dispose()
{
  if (this.Handle != 0)
    Somehow.Close(this.Handle);
  this.Handle = 0;
}

This should already be raising red flags in your mind. We’re mutating a value type, which we know is dangerous because value types are copied by value; you’re mutating a variable, and different variables can hold copies of the same value. Mutating one variable does not mutate the others, any more than changing one variable that contains 12 changes every variable in the program that also contains 12. But let’s go with it for now.

What does this do?
Continue reading

Never say never, part two

This is part two of a two-part series about determining whether the endpoint of a method is never reachable. Part one is here. A follow-up article is here.


Whether we have a “never” return type or not, we need to be able to determine when the end point of a method is unreachable for error reporting in methods that have non-void return type. The compiler is pretty clever about working that out; it can handle situations like

int M()
{
  try
  {
    while(true) N();
  }
  catch(Exception ex)
  {
    throw new WrappingException(ex);
  }
}

The compiler knows that N either throws or it doesn’t, and that if it doesn’t, then the try block never exits, and if it does, then either the construction of the exception throws, or the construction succeeds and the catch throws the new exception. No matter what, the end point of M is never reached.

However, the compiler is not infinitely clever. It is easy to fool it:
Continue reading

Looking inside a double

Occasionally when I’m debugging the compiler or responding to a user question I’ll need to quickly take apart the bits of a double-precision floating point number. Doing so is a bit of a pain, so I’ve whipped up some quick code that takes a double and tells you all the salient facts about it. I present it here, should you have any use for it yourself.[1. Note that this code was built for comfort, not speed; it is more than fast enough for my purposes so I’ve spent zero time optimizing it.]

Continue reading

Strange but legal

“Can a property or method really be marked as both abstract and override?” one of my coworkers just asked me. My initial gut response was “of course not!” but as it turns out, the Roslyn codebase itself has a property getter marked as both abstract and override. (Which is why they were asking in the first place.)

I thought about it a bit more and reconsidered. This pattern is quite rare, but it is perfectly legal and even sensible. The way it came about in our codebase is that we have a large, very complex type hierarchy used to represent many different concepts in the compiler. Let’s call it “Thingy”:

abstract class Thingy 
{
  public virtual string Name { get { return ""; } } 
  ...

There are going to be a lot of subtypes of Thingy, and almost all of them will have an empty string for their name. Or null, or whatever; the point is not what exactly the value is, but rather that there is a sensible default name for almost everything in this enormous type hierarchy.

However, there is another abstract kind of Thingy, a FrobThingy, which always has a non-empty name. In order to prevent derived classes of FrobThingy from accidentally using the default implementation from the base class, we said:

abstract class FrobThingy : Thingy 
{   
  public abstract override string Name { get; } }
  ...

Now if you make a derived class BigFrobThingy, you know that you have to provide an implementation of Name for it because it will not compile if you don’t.

Curiouser and curiouser

Here’s a pattern you see all the time in C#:

class Frob : IComparable<Frob>

At first glance you might ask yourself why this is not a “circular” definition; after all, you’re not allowed to say class Frob : Frob(*). However, upon deeper reflection that makes perfect sense; a Frob is something that can be compared to another Frob. There’s not actually a real circularity there.

This pattern can be genericized further:

class SortedList<T> where T : IComparable<T>

Again, it might seem a bit circular to say that T is constrained to something that is in terms of T, but actually this is just the same as before. T is constrained to be something that can be compared to T. Frob is a legal type argument for a SortedList because one Frob can be compared to another Frob.

But this really hurts my brain:

class Blah<T> where T : Blah<T>

That appears to be circular in (at least) two ways. Is this really legal?

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don’t like it and I discourage its use.

This is a C# variation on what’s called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

So, why would you want to do that, and why do I object to it?

One reason why people want to do this is to enforce a particular constraint in a type hierarchy. Suppose we have

abstract class Animal
{
  public virtual void MakeFriends(Animal animal);
}

But that means that a Cat can make friends with a Dog, and that would be a crisis of Biblical proportions! (***) What we want to say is

abstract class Animal
{
  public virtual void MakeFriends(THISTYPE animal);
}

so that when Cat overrides MakeFriends, it can only override it with Cat.

Now, that immediately presents a problem in that we’ve just violated the Liskov Substitution Principle. We can no longer call a method on a variable of the abstract base type and have any confidence that type safety is maintained. Variance on formal parameter types has to be contravariance, not covariance, for it to be typesafe. And moreover, we simply don’t have that feature in the CLR type system.

But you can get close with the curious pattern:

abstract class Animal<T> where T : Animal<T>
{
  public virtual void MakeFriends(T animal);
}
class Cat : Animal<Cat>
{
  public override void MakeFriends(Cat cat) {}
}

and hey, we haven’t violated the LSP and we have guaranteed that a Cat can only make friends with a Cat. Beautiful.

Wait a minute… did we really guarantee that?

class EvilDog : Animal<Cat>
{
  public override void MakeFriends(Cat cat) { }
}

We have not guaranteed that a Cat can only make friends with a Cat; an EvilDog can make friends with a Cat too. The constraint only enforces that the type argument to Animal be good; how you use the resulting valid type is entirely up to you. You can use it for a base type of something else if you wish.

So that’s one good reason to avoid this pattern: because it doesn’t actually enforce the constraint you think it does. Everyone has to play along and agree that they’ll use the curiously recurring pattern the way it was intended to be used, rather than the evil dog way that it can be used.

The second reason to avoid this is simply because it bakes the noodle of anyone who reads the code. When I see List<Giraffe> I have a very clear idea of what the relationship is between the List<> part — it means that there are going to be operations that add and remove things — and the Giraffe part — those operations are going to be on giraffes. When I see FuturesContract<T> where T : LegalPolicy I understand that this type is intended to model a legal contract about a transaction in the future which has some particular controlling legal policy. But when I read Blah<T> where T : Blah I have no intuitive idea of what the intended relationship is between Blah<T> and any particular TIt seems like an abuse of a mechanism rather than the modeling of a concept from the program’s “business domain”.

All that said, in practice there are times when using this pattern really does pragmatically solve problems in ways that are hard to model otherwise in C#; it allows you to do a bit of an end-run around the fact that we don’t have covariant return types on virtual methods, and other shortcomings of the type system. That it does so in a manner that does not, strictly speaking, enforce every constraint you might like is unfortunate, but in realistic code, usually not a problem that prevents shipping the product.

My advice is to think very hard before you implement this sort of curious pattern in C#; do the benefits to the customer really outweigh the costs associated with the mental burden you’re placing on the code maintainers?


(*) Due to an unintentional omission, some past editions of the C# specification actually did not say that this was illegal! However, the compiler has always enforced it. In fact, the compiler has over-enforced it, sometimes accidentally catching non-cycles and marking them as cycles.

(**) Most frequently in emails asking “is this really legal?”

(***) Mass hysteria!

Bad comparisons, part four

One more easy one. I want to “sort” a list into a random, shuffled order. I can do that by simply randomizing whether any two elements are greater than, less than, or equal to each other:

myList.Sort((x, y) => (new Random()).Next(-1, 2));

That generates a random -1, 0 or 1 for every comparison, right? So it will sort the list into random order, right?

.

.

.

.

.

.

.

There are multiple defects here. First off, clearly this violates all our rules for comparison functions. It does not produce a total ordering, and in fact it can tell you that two particular elements are equal and then later tell you that they have become unequal. The sort algorithm is allowed to go into infinite loops or crash horribly when given such an ill-behaved comparison function. And in fact, some implementations of sorting algorithms attempt to detect this error and will throw an exception if they determine that the comparison is inconsistent over time.

Second, every time this thing is called it creates a new Random instance seeded to the current time, and therefore it produces the same result over and over again if called multiple times in the same timeslice; hardly random.

Shuffling is not sorting; it is the opposite of sorting, so don’t use a sort algorithm to shuffle. There are lots of efficient shuffle algorithms that are easy to implement. (That said, it is legal to shuffle by sorting a list ordered by a randomly chosen key. But the key must be chosen exactly once for each item in the list and there must be a correct comparison function on that key.)

Bad comparisons, part three

Did you notice how last time my length comparison on strings was unnecessarily verbose? I could have written it like this:

static int ByLength(string x, string y)
{ 
  if (x == null && y == null) return 0;
  if (x == null) return -1; 
  if (y == null) return 1; 
  return CompareInts(x.Length, y.Length); 
}

static int CompareInts(int x, int y) { 
  // positive if x is larger
  // negative if y is larger
  //  zero if equal   
  return x - y; 
}

static Comparison<T> ThenBy<T>(this Comparison<T> firstBy, Comparison<T> thenBy) 
{ 
  return (x,y)=> 
  { 
    int result = firstBy(x, y); 
    return result != 0 ? result : thenBy(x, y); 
  }
}

Much nicer! My string length comparison method is greatly simplified, I can compose comparisons easily with the ThenBy extension method, and I can reuse CompareInts as a helper function in other comparisons I might want to write.

What’s the defect now?

.

.

.

.

.

.

This one should be easy after last time. The lengths of strings are always positive integers and in practice they never go above a few hundred million; the CLR does not allow you to allocate truly enormous multi-gigabyte strings. But though CompareInts is safe for inputs which are string lengths, it is not safe in general. In particular, for the inputs Int32.MinValue and Int32.MaxValue, the difference is 1. Clearly the smallest possible integer is smaller than the largest possible integer, but this method gives the opposite result! CompareInts should read:

static int CompareInts(int x, int y) 
{ 
  if (x > y) return 1; 
  if (x < y) return -1;
  return 0; 
}

The moral of the story here is that a comparison function that doesn’t compare something is probably wrong. Subtraction is not comparison.

Next time on FAIC: One more bad comparison.