Unknown's avatar

About ericlippert

http://ericlippert.com

Is is as or is as is?

Today a question about the is and as operators: is the is operator implemented as a syntactic sugar for the as operator, or is the as operator implemented as a syntactic sugar for the is operator? More briefly, is is as or is as is?

Perhaps some sample code would be more clear. This code

bool b = x is Foo;

could be considered as a syntactic sugar for

bool b = (x as Foo) != null;

in which case is is a syntactic sugar for as. Similarly,

Foo f = x as Foo;

could be considered to be a syntactic sugar for

var temp = x;
Foo f = (temp is Foo) ? (Foo)temp : (Foo)null;

in which case as is a syntactic sugar for is. Clearly we cannot have both of these be sugars because then we have an infinite regress!

The specification is clear on this point: as (in the non-dynamic case) is defined as a syntactic sugar for is.

However, in practice the CLR provides us instruction isinst, which ironically acts like as. We have an instruction which implements the semantics of as pretty well, from which we can build an implementation of is. In short, de jure is is is, and as is as is is, but de facto is is as and as is isinst.

I now invite you to leave the obvious jokes about President Clinton in the comments.


Commentary from 2020:

  • This article was based on a serious question I had been asked, but of course I relished the opportunity to make silly sentences such as “is is as or is as is?”
  • The bit at the end about President Clinton confused some readers, understandably. Read the explainer here for details. The TLDR is: President Clinton famously attempted to explain a lie with “it depends on what the meaning of ‘is’ is”.
  • In the original comments a reader asked why anyone should care about this level of implementation detail. My response to that question gets straight to the philosophy of why I’ve written a blog for 17 years:

I reject the premise of the question; I’m not saying that you or anyone else should care about this detail. I do not necessarily write about things that are important for developers to understand; I write about things that are fabulous! And the person who gets to decide what is fabulous is me. If some of those fabulous things happen to be important, that’s super, but that’s not my aim. The name of the blog is not “Important Stuff You Should Know About Coding“.


I blog because I’ve learned a bunch of interesting stuff — interesting to me — and I like to share things that I find interesting.

    • Another good question in the comments was “does the compiler statically analyze is and as expressions?”  Yes! If you write something like if(x is string) and we already know that x is some class type Foo then the expression will always be false and you get a warning.However, there are subtle problems here. The rules of isinst and the rules of C# are slightly different. In particular, it is illegal in C# to convert int[] to uint[] via reference conversion, but legal in the CLR. You can therefore get into situations where the compiler will warn about “impossible” situations that are actually possible.
    • Several readers pointed out that there is a missing language feature: once we know that x is Foo is true, it should be legal to use the members of Foo from x. A form of this feature was added to C# 6; if(x is Foo f)

Don’t repeat yourself; consts are already static

Today, another entertaining question from StackOverflow. Presented again as a dialogue, as is my wont.

The specification says “even though constants are considered static members, a constant-declaration neither requires nor allows a static modifier.” Why was the decision made to not force constants to use the static modifier if they are considered static?

Let’s grant that it is sensible that constants are considered to be “static” – that is, members associated with the type itself, rather than members of a particular instance. Let’s also grant that “const” has to be in there somewhere. There are three possible choices for how to notate a constant declaration:

1) Make static optional: “const int x…” or “static const int x…” are both legal.
2) Make static required: “const int x…” is illegal, “static const int x…” is legal
3) Make static illegal: “const int x…” is legal, “static const int x…” is illegal.

Does that pretty much sum it up?

Yes. Why did the design team choose option (3) instead of (1) or (2)?

The design notes from 1999 do not say. But we can deduce what was probably going through the language designer’s heads.

The problem with (1) is that you could read code that uses both “const int x…” and “static const int y…” and then you would naturally ask yourself “what’s the difference?” Since the default for non-constant fields and methods is “instance” unless marked “static”, the natural conclusion would be that some constants are per-instance and some are per-type, and that conclusion would be wrong. This is bad because it is misleading.

The problem with (2) is that first off, it is redundant. It’s just more typing without adding clarity or expressiveness to the language. And second, I don’t know about you, but I personally hate it when the compiler gives me the error “You forgot to say the magic word right here. I know you forgot to say the magic word, I am one hundred percent capable of figuring out that the magic word needs to go there, but I’m not going to let you get any work done until you say the magic word”. I once heard that tendency of compilers compared to the guy at the party who interrupts your conversation to point out that you said “a historic occasion” when clearly you meant to say “an historic occasion“. No one likes that guy.

The problem with (3) is that the developer is required to know that const logically implies static. However, once the developer learns this fact, they’ve learned it. It’s not like this is a complex idea that is hard to figure out.

The solution which presents the fewest problems and costs to the end user is (3).

That seems reasonable. Does the C# language apply this principle of eschewing redundancy consistently?

Nope! It is interesting to compare and contrast this with other places in the language where different decisions were made.

For example, overloaded operators are required to be both public and static. In this case, again we are faced with three options:

(1) make public static optional,
(2) make it required, or
(3) make it illegal.

For overloaded operators we chose (2). Since the “natural state” of a method is private/instance it seems bizarre and misleading to make something that looks like a method public/static invisibly, as (1) and (3) both require.

For another example, a virtual method with the same signature as a virtual method in a base class is supposed to have either “new” or “override” on it. Again, three choices.

(1) make it optional: you can say new, or override, or nothing at all, in which case we default to new.
(2) make it required: you have to say new or override, or
(3) make it illegal: you cannot say new at all, so if you don’t say override then it is automatically new.

In this case we chose (1) because that works best for the brittle base class situation of someone adds a virtual method to a base class that you don’t realize you are now overriding. This produces a warning, but not an error.

Each of these situations has to be considered on a case-by-case basis. The general guidance is not , say “always pick option 3”, but rather, to figure out what is least misleading to the typical user and go with that.

Knights, knaves, protected and internal

When you override a virtual method in C# you are required to ensure that the stated accessibility of the overridden method – that is, whether it is public, internal, protected or protected internal(*) – is exactly re-stated in the overriding method. Except in one case. I refer you to section 10.6.4 of the specification, which states:

an override declaration cannot change the accessibility of the virtual method. However, if the overridden base method is protected internal and it is declared in a different assembly than the assembly containing the override method then the override method’s declared accessibility must be protected.

What the heck is up with that? Surely if an overridden method is protected internal then it only makes sense that the overriding method should be exactly the same: protected internal.

I’ll explain why we have this rule, but first, a brief digression.

A certain island is inhabited by only knights and knaves. Knights make only true statements and only answer questions truthfully; knaves make only false statements and only answer questions untruthfully. If you walk up to an inhabitant of the (aptly-named) Island of Knights and Knaves you can rapidly ascertain whether a particular individual is a knight or a knave by asking a question you know the answer to. For example “does two plus two equal four?” A knight will answer “yes” (**), and a knave will answer “no”. Knaves are prone to saying things like “my mother is a male knight”, which is plainly false.

It might seem at first glance that there is no statement which could be made by both a knight and a knave. Since knights tell the truth and knaves lie, they cannot both make the same statement, right? But in fact there are many statements that can be made by both. Can you think of one?

.

.

.

.

.

.

.

Both a knight and a knave can say “I am a knight.”

How does that work? The reason this works is because the pronoun “I” refers to different people when uttered by different people. If Alice, a knight, makes the statement “I am a knight”, she is asserting the truth that “Alice is a knight”. If Bob, a knave, makes the statement “I am a knight”, he is not asserting the true statement “Alice is a knight” but rather the false statement “Bob is a knight”. Similarly, both Alice and Bob can assert the statement “My name is Alice,” for the same reason.

And that’s why overriding methods in a different assembly aren’t “protected internal”. The modifier “internal” is like a pronoun; it refers to the current assembly. When used in two different assemblies it means different things. The purpose of the rule is to ensure that a derived class does not make the accessibility domain of the virtual member any larger or smaller.

An analogy might help. Suppose a protected resource is a car, an assembly is a dwelling, a person is a class and descendent is a derived class.

  • Alice has a Mazda and lives in House with her good friend Charlie.
  • Charlie has a child, Diana, who lives in Apartment.
  • Alice has a child, Elroy, who lives in Condo with his good friend Greg.
  • Elroy has a child – Alice’s grandchild — Frank, who lives in Yurt.

Alice grants access to Mazda to anyone living in House and any descendent of Alice. The people who can access Mazda are Alice, Charlie, Elroy, and Frank.

Diana does not get access to Mazda because she is not Alice, not a descendent of Alice, and not a resident of House. That she is a child of Alice’s housemate is irrelevant.

Greg does not get access to Mazda for the same reason: he is not Alice, not a descendent of Alice, and not a resident of House. That he is a housemate of a descendent of Alice is irrelevant.

Now we come to the crux of the matter. Elroy is not allowed to extend his access to Mazda to Greg. Alice owns that Mazda and she said “myself, my descendents and my housemates”. Her children don’t have the right to extend the accessibility of Mazda beyond what she initially set up. Nor may Elroy deny access to Frank; as a descendent of Alice, Frank has a right to borrow the car and Greg cannot stop him by making it “private”.

When Elroy describes what access he has to Mazda he is only allowed to say “I grant access to this to myself and my descendents” because that is what Alice already allowed. He cannot say “I grant access to Mazda to myself, my descendents and to the other residents of Condo”.


(*) Private virtual methods are illegal in C#, which irks me to no end. I would totally use that feature if we had it.

(**) Assuming that they answer at all, of course; there could be mute knights and knaves.

Socks, birthdays and hash collisions

Suppose you’ve got a huge mixed-up pile of white, black, green and red socks, with roughly equal numbers of each. You randomly choose two of them. What is the probability that they are a matched pair?

There are sixteen ways of choosing a pair of socks: WW, WB, WG, WR, BW, BB, … Of those sixteen pairs, four of them are matched pairs. So chances are 25% that you get a matched pair.

Suppose you choose three of them. What is the probability that amongst the socks you chose, there exists at least one matched pair?
Continue reading

Continuing to an outer loop

When you have a nested loop, sometimes you want to “continue” the outer loop, not the inner loop. For example, here we have a sequence of criteria and a sequence of items, and we wish to determine if there is any item which matches every criterion:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      // No point in checking anything further; this is not
      // a match. We want to “continue” the outer loop. How?
    }
  }
  match = item;
  break;
}

There are many ways to achieve this. Here are a few, in order of increasing awesomeness:

Technique #1 (ugly): When you have a loop, the continue statement is essentially a goto which branches to the bottom of the loop, just before it does the loop condition test and pops back up to the top of the loop. (Apparently when you spell goto as continue, the “all gotos are all evil all the time” crowd doesn’t bother you as much.) You can of course simply make this explicit:

match = null;
foreach(var item in items)
  {
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      goto OUTERCONTINUE;
    }
  }
  match = item;
  break;
  OUTERCONTINUE:
  ;
}

Technique #2 (better): When I see a nested loop, I almost always consider refactoring the interior loop into a method of its own.

match = null;
foreach(var item in items)
{
  if (MeetsAllCriteria(item, critiera))
  {
    match = item;
    break;
  }
}

where the body of MeetsAllCriteria is obviously

foreach(var criterion in criteria)
{
  if (!criterion.IsMetBy(item))
  {
    return false;
  }
}
return true;

Technique #3 (awesome): The “mechanism” code here is emphasizing how the code works and hiding the meaning of the code. The purpose of the code is to answer the question “what is the first item, if any, that meets all the criteria?” If that’s the meaning of the code, then write code that reads more like that meaning:

var matches = from item in items
              where criteria.All(criterion=>criterion.IsMetBy(item))
              select item;
match = matches.FirstOrDefault();

That is, search the items for items that meet every criterion, and take the first of them, if there is one. The code now reads like what it is supposed to be implementing, and of course, if there are no loops in the first place then there’s no need to “continue” at all!

The thing I love most about LINQ is that it enables a whole new way to think about problem solving in C#. It’s gotten to the point now that whenever I write a loop, I stop and think about two things before I actually type “for”:

  • Have I described anywhere what this code is doing semantically, or does the code emphasize more what mechanisms I am using at the expense of obscuring the semantics?
  • Is there any way to characterize the solution to my problem as the result of a query against a data source, rather than as the result of a set of steps to be followed procedurally?

What’s the difference between covariance and assignment compatibility?

I’ve written a lot about this already, but I think one particular point bears repeating.

As we’re getting closer to shipping C# 4.0, I’m seeing a lot of documents, blogs, and so on, attempting to explain what “covariant” means. This is a tricky word to define in a way that is actually meaningful to people who haven’t already got degrees in category theory, but it can be done. And I think it’s important to avoid defining a word to mean something other than its actual meaning.

A number of those documents have led with something like:

“Covariance is the ability to assign an expression of a more specific type to a variable of a less specific type. For example, consider a method M that returns a Giraffe. You can assign the result of M to a variable of type Animal, because Animal is a less specific type that is compatible with Giraffe. Methods in C# are ‘covariant’ in their return types, which is why when you create a covariant interface, it is indicated with the keyword ‘out’ — the returned value comes ‘out’ of the method.”

But that’s not at all what covariance means. That’s describing “assignment compatibility” — the ability to assign a value of a more specific type to a storage of a compatible, less specific type is called “assignment compatibility” because the two types are compatible for the purposes of verifying legality of assignments.

So what does covariance mean then?

First off, we need to work out precisely what the adjective “covariant” applies to. I’m going to get more formal for a bit here but try to keep it understandable.

Let’s start by not even considering types. Let’s think about integers. (And here I am speaking of actual mathematical integers, not of the weird behaviour of 32-bit integers in unchecked contexts.) Specifically, we’re going to think about the ≤ relation on integers, the “less than or equal to” relation. (Recall that of course a “relation” is a function which takes two things and returns a bool which indicates whether the given relationship holds or does not hold.)

Now let’s think about a projection on integers. What is a projection? A projection is a function which takes a single integer and returns a new integer. So, for example, z → z + z is a projection; call it D for “double”.  So are z → 0 - z, call it N for “negate” and z → z * z, call it S for “square”.

Now, here’s an interesting question. Is it always the case that (x ≤ y) = (D(x) ≤ D(y))?  Yes, it is. If x is less than y, then twice x is less than twice y. If x is equal to y then twice x is equal to twice y. And if x is greater than y, then twice x is greater than twice y. The projection D preserves the direction of the comparison.

What about N? Is it always the case that (x ≤ y) = (N(x) ≤ N(y))?  Clearly not. 1 ≤ 2 is true, but -1 ≤ -2 is false.

But we notice that the reverse is always true!  (x ≤ y) = (N(y) ≤ N(x)). The projection N reverses the direction of the comparison.

What about S? Is it always the case that (x ≤ y) = (S(x) ≤ S(y))? No. -1 ≤ 0 is true, but S(-1) ≤ S(0) is false.

What about the opposite? Is it always the case that (x ≤ y) = (S(y) ≤ S(x)) ?

Again, no. 1 ≤ 2 is true, but S(2) ≤ S(1) is false.

The projection S does not preserve the direction of the comparison relation, and nor does it reverse it.

The projection D is “covariant” — it preserves the ordering relationship on integers. The projection N is “contravariant”. It reverses the ordering relationship on integers. The projection S does neither; it is “invariant”.

Now I hope it is more clear exactly what is covariant or contravariant. The integers themselves are not variant, and the comparison is not variant. It’s the projection that is covariant or contravariant with respect to the comparison.

So now let us abandon integers and think about reference types. Instead of the ≤ relation on integers, we have the ≤ relation on reference types. A reference type X is smaller than (or equal to) a reference type Y if a value of type X can be stored in a variable of type Y. That is, if X is “assignment compatible” with Y.

Now consider a projection from types to types. Say, the projection “T goes to IEnumerable<T>“.  That is, we have a projection that takes a type, say, Giraffe, and gives you back a new type, IEnumerable<Giraffe>. Is that projection covariant in C# 4 (with respect to assignment compatibility)?  Yes. It preserves the direction of ordering. A Giraffe may be assigned to a variable of type Animal, and a sequence of Giraffes may be assigned to a variable that can hold a sequence of Animals.

We can think of generic types as “blueprints” that produce constructed types. Let’s take the projection that takes a type T and produces IEnumerable<T> and simply call that projection “IEnumerable<T>“. We can understand from context when we say “IEnumerable<T> is covariant” what we mean is “the projection which takes a reference type T and produces a reference type IEnumerable<T> is a covariant projection on the assignment compatibility relation”. And since IEnumerable<T> only has one type parameter, it is clear from the context that we mean that the parameter to the projection is T. After all, it is a lot shorter to say “IEnumerable<T> is covariant” than that other mouthful.

So now we can define covariance, contravariance and invariance. A generic type I<T> is covariant (in T) if construction with reference type arguments preserves the direction of assignment compatibility. It is contravariant (in T) if it reverses the direction of assignment compatibility. And it is invariant if it does neither. And by that, we simply are saying in a concise way that the projection which takes a T and produces I<T> is a covariant/contravariant/invariant projection.


 UPDATE: My close personal friend (and fellow computer geek) Jen notes that in the Twilight series of novels, the so-called “werewolves” (who are not transformed by the full moon and therefore not actually werewolves) maintain their rigid social ordering in both wolf and human form; the projection from human to wolf is covariant in the social-ordering relation. She also notes that in high school, programming language geeks are at the bottom of the social order, but the projection to adulthood catapults them to the top of the social order, and therefore, growing up is contravariant. I am somewhat skeptical of the latter claim; the former, I’ll take your word for it. I suppose the question of how social order works amongst teenage werewolves who are computer geeks is a subject for additional research. Thanks Jen!


Original post.

Closing over the loop variable considered harmful, part two

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The for loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week’s post.

More countries really ought to implement Instant-Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form “I’d prefer to make the change, but if you can’t do that then make it a warning”. Or “don’t make the change, do make it a warning”, and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

  • Commenters who expressed support for a warning: 26
  • Commenters who expressed the sentiment “it’s better to not make the change”: 24
  • Commenters who expressed the sentiment “it’s better to make the change”: 25

Wow. I guess we’ll flip a coin. :-)[1. Mr. Smiley Face indicates that Eric is indulging in humourous japery.]

Continue reading

Closing over the loop variable considered harmful, part one

UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The for loop will not be changed. We return you now to our original article.


I don’t know why I haven’t blogged about this one before; this is the single most common incorrect bug report we get. That is, someone thinks they have found a bug in the compiler, but in fact the compiler is correct and their code is wrong. That’s a terrible situation for everyone; we very much wish to design a language which does not have “gotcha” features like this.

But I’m getting ahead of myself. Continue reading

Simple names are not so simple, part two

Today, the solution to the puzzle from last time. The code is correct, and compiles without issue. I was quite surprised when I first learned that; it certainly looks like it violates our rule about not using the same simple name to mean two different things in one block.

The key is to understanding why this is legal is that the query comprehensions and foreach loops are specified as syntactic sugars for another program, and it is that program which is actually analyzed for correctness. Our original program:

static void Main()
{
  int[] data = { 1, 2, 3, 1, 2, 1 };
    foreach (var m in from m in data orderby m select m)
      System.Console.Write(m);
}

is transformed into

static void Main()
{
  int[] data = { 1, 2, 3, 1, 2, 1 };
  {
    IEnumerator<int> e = 
      ((IEnumerable<int>)(data.OrderBy(m=>m)).GetEnumerator();
    try
    {
      int m; // Inside the "while" in C# 5 and above, outside in C# 1 through 4.
      while(e.MoveNext())
      {
        m = (int)(int)e.Current;
        Console.Write(m);
      }
    }
    finally
    {
      if (e != null) ((IDisposable)e).Dispose();
    }
  }
}

There are five usages of m in this transformed program; m is:

  1. declared as the formal parameter of a lambda.
  2. used in the body of the lambda; here it refers to the formal parameter.
  3. declared as a local variable
  4. written to in the loop; here it refers to the local variable
  5. read from in the loop; here it refers to the local variable

Is there any usage of a local variable before its declaration? No.

Are there any two declarations that have the same name in the same declaration space? It would appear so. The body of Main defines a local variable declaration space, and clearly the body of Main contains, indirectly, two declarations for m, one as a formal parameter and one as a local. But I said last time: local variable declaration spaces have special rules for determining overlaps. It is illegal for a local variable declaration space to directly contain a declaration such that another nested local variable declaration space contains a declaration of the same name. But an outer declaration space which indirectly contains two such declarations is not  an error. So in this case, no, there are no local variable declarations spaces which directly contain a declaration for m, such that a nested local variable declaration space also directly contains a declaration for m. Our two local variable declarations spaces which directly contain a declaration for m do not overlap anywhere.

Is there any declaration space which contains two inconsistent usages of the simple name m? Yes, again, the outer block of Main contains two inconsistent usages of m. But again, this is not relevant. The question is whether any declaration spaces directly containing a usage of m have an inconsistent usage. Again, we have two declaration spaces but they do not overlap each other, so there’s no problem here either.

The thing which makes this legal, interestingly enough, is the generation of the loop variable declaration logically within the try block. Were it to be generated outside the try block then this would be a violation of the rule about inconsistent usage of a simple name throughout a declaration space.