What is the type of the null literal?

The C# 2.0 specification says

The null literal evaluates to the null value, which is used to denote a reference not pointing at any object or array, or the absence of a value. The null type has a single value, which is the null value.

But every version of the specification since then does not contain this language. So what then is the type of the null literal expression?

Continue reading

Why does a foreach loop silently insert an “explicit” conversion?

The C# specification defines

foreach (V v in x) 
  embedded-statement

as having the same semantics as:

{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}

(Actually this is not exactly what the spec says; I’ve made one small edit because I don’t want to get into the difference between the element type and the loop variable type in this episode.)

There are a lot of subtleties here that we’ve discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that’s an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code. (Or the long being a constant that fits into an int.) What justifies this odd design choice?

Continue reading

Why not allow double/decimal implicit conversions?

I’ve talked a lot about floating point math over the years in this blog, but a quick refresher is in order for this episode.

A double represents a number of the form +/- (1 + F / 252 ) x 2E-1023, where F is a 52 bit unsigned integer and E is an 11 bit unsigned integer; that makes 63 bits and the remaining bit is the sign, zero for positive, one for negative. You’ll note that there is no way to represent zero in this format, so by convention if F and E are both zero, the value is zero. (And similarly there are other reserved bit patterns for infinities, NaN and denormalized floats which we will not get into today.)

A decimal represents a number in the form +/- V / 10X where V is a 96 bit unsigned integer and X is an integer between 0 and 28.

Both are of course “floating point” because the number of bits of precision in each case is fixed, but the position of the decimal point can effectively vary as the exponent changes.
Continue reading

Why are generic constraints not inherited?

Today’s episode is presented as a dialogue:

Why are generic constraints not inherited?

Members are inherited. Generic constraints are not members of a type.

But generic constraints are inherited on generic methods, right?

That’s true, though what is inherited is the method and the constraint comes along with it. What is a bit odd is: generic constraints on methods are invisibly inherited when overriding, which has always vexed me. My preferred design would have been to require that the constraint be re-stated. That is, when you say:

class B
{
  public virtual void M<T>() where T : struct { }
}
class D : B
{
  public override void M<U>() { }
}

U in D.M<U> is still constrained to struct, but it is actually illegal in C# to re-state that fact redundantly! This is a bit of a misfeature in my opinion; it works against code clarity for the reader. Particularly since the base class might not even be in source code; it might be in a different assembly entirely.

(Note that I’m emphasizing here that there is no requirement that the type parameters be named the same. Similarly there is no requirement that the formal parameters be named the same either, but it is a good idea to do so. We’ll come back to this point in a moment.)

Continue reading

The psychology of C# analysis

developersThe organizers of the recent Static Analysis Symposium — conveniently held four blocks from my office — were kind enough to invite me to give the opening talk. Now, this is a conference where the presentations have titles like “Efficient Generation of Correctness Certificates for the Abstract Domain of Polyhedra“; I know what all those words mean individually, it’s just them next to each other in that order that I don’t understand. Fortunately for me, the SAS organizers invite people in industry to give talks about the less academic, more pragmatic aspects of program analysis, which I was happy to do.

They also let me pad my presentation with funny pictures of cats, which helped a lot.

Unfortunately I don’t have a recording of the talk, but my slides are posted here if you want to check them out.

Special thanks to Scott Meyer of BasicInstructions.net who was kind enough to allow me to use his comic about informative presentations in my informative presentation.

Why are braces required in try-catch-finally?

Developers who use C-like languages typically conceive of if, while, for, and so on as taking either a single statement, or a group of any number of statements in a block:

if (x)
  M();
if (x)
{
  M();
  N();
}

However, that’s not how programming language designers think of it. Rather, if and while and for and so on each take a single statement, and a braced block is a single statement.[1. C# has an additional rule that the statement in an if, while and so on may not be a single local variable declaration; that’s a good subject for another day.]

No matter how we choose to think about the grammar, it is certainly the case that try-catch-finally is different than if and while and for and so on; try-catch-finally requires a braced block. That seems inconsistent; is there a justification for this inconsistency?

Before we dig into the try-catch-finally case, let’s first consider the problems with this approach. The looping structures are unambiguous:

while(A())
  while(B())
    C();

The inner while statement composes nicely with the outer while statement; no braces are required to make sense of this. But that is not the case with if, thanks to the famous “dangling else problem”:

if (A())
  if (B())
    C();
else
  D();

OK, quick, is the indenting correct there? Is else D() associated with the inner if statement or the outer one?

It’s associated with the inner one; the else matches the nearest containing if. But in a language where whitespace doesn’t matter, it is very easy to accidentally indent this wrong and get the wrong impression when reading the code. I’ve also seen badly-written macros in C and C++ that caused the dangling-else problem to arise.

When adding try-catch-finally to the language, the designers wished to avoid adding a second kind of dangling else problem. Suppose that you could put any statement after a try, catch or finally, rather than having to put a block statement. How do you analyze this program fragment?

try
  try
    A();
  catch AException
    B();
catch BException
  C();

OK, quick, is the indenting correct there? Is B() protected? That is, should we parse this as

try
{
  try
  {
    A();
  }
  catch AException
  {
    B(); // protected by the outer try
  }
}
catch BException
{
  C();
}

Or is it this erroneous program?

try // try without associated catch!
{
  try
  {
    A();
  }
  catch AException
  {
    B(); // not protected
  }
  catch BException
  {
    C();
  }
}

Rather than attempt to come up with a rule to disambiguate the ambiguous parse, it is better to simply avoid the ambiguity altogether and require the braces. The last thing we need in this language is more ambiguity.

While we’re on the subject, an interesting thing about the try block is that of course the try keyword is completely unnecessary from a grammatical perspective. We could simply have said that any block can be followed by any number of catch blocks or a finally block, and the block thus followed is implicitly a try block. This is a good example of how enforcing redundancy into the language makes it more readable; the try keyword calls the reader’s attention to the fact that the control flow of this part of the method needs to deal with exceptional situations.

And one additional fun fact: in the initial design of C#, there was no such thing as try-catch-finally. There was try-catch and try-finally. If you wanted to have a try-catch-finally then you’d write:

try
{
  try
  {
    A();
  }
  catch AException
  {
    B();
  }
}
finally
{
  C();
}

The language designers realized that this was a common pattern and unnecessarily wordy, so they allowed the syntactic sugar of eliminating the outer try. The C# compiler actually generates the code as though you’d written the nested blocks, since at the CIL level there is no try-catch-finally.


Eric is on vacation; this posting was pre-recorded.


Next time on FAIC: What happens when unmanaged code holds on to a fixed pointer? Nothing good.

Why is deriving a public class from an internal class illegal?

In C# it is illegal to declare a class D whose base class B is in any way less accessible than D. I’m occasionally asked why that is. There are a number of reasons; today I’ll start with a very specific scenario and then talk about a general philosophy.

Suppose you and your coworker Alice are developing the code for assembly Foo, which you intend to be fully trusted by its users. Alice writes:

public class B
{
  public void Dangerous() {...}
}

And you write

public class D : B
{
  ... other stuff ...
}

Later, Alice gets a security review from Bob, who points out that method Dangerous could be used as a component of an attack by partially-trusted code, and who further points out that customer scenarios do not actually require B to be used directly by customers in the first place; B is actually only being used as an implementation detail of other classes. So in keeping with the principle of least privilege, Alice changes B to:

internal class B
{
  public void Dangerous() {...}
}

Alice need not change the accessibility of Dangerous, because of course public means “public to the people who can see the class in the first place”.

So now what should happen when Alice recompiles before she checks in this change? The C# compiler does not know if you, the author of class D, intended method Dangerous to be accessible by a user of public class D. On the one hand, it is a public method of a base class, and so it seems like it should be accessible. On the other hand, the fact that B is internal is evidence that Dangerous is supposed to be inaccessible outside the assembly. A basic design principle of C# is that when the intention is unclear, the compiler brings this fact to your attention by failing. The compiler is identifying yet another form of the Brittle Base Class Failure, which long-time readers know has shown up in numerous places in the design of C#.

Rather than simply making this change and hoping for the best, you and Alice need to sit down and talk about whether B really is a sensible base class of D; it seems plausible that either (1) D ought to be internal also, or (2) D ought to favour composition over inheritance. Which brings us to my more general point:

More generally: the inheritance mechanism is, as we’ve discussed before, simply the fact that all heritable members of the base type are also members of the derived type. But the inheritance relationship semantics are intended to model the “is a kind of” relationship. It seems reasonable that if D is a kind of B, and D is accessible at a location, then B ought to be accessible at that location as well. It seems strange that you could only use the fact that “a Giraffe is a kind of Animal” at specific locations.

In short, this rule of the language encourages you to use inheritance relationships to model the business domain semantics rather than as a mechanism for code reuse.

Finally, I note that as an alternative, it is legal for a public class to implement an internal interface. In that scenario there is no danger of accidentally exposing dangerous functionality from the interface to the implementing type because of course an interface is not associated with any functionality in the first place; an interface is logically “abstract”. Implementing an internal interface can be used as a mechanism that allows public components in the same assembly to communicate with each other over “back channels” that are not exposed to the public.

Dynamic contagion, part two

This is part two of a two-part series on dynamic contagion. Part one is here.


Last time I discussed how the dynamic type tends to spread through a program like a virus: if an expression of dynamic type “touches” another expression then that other expression often also becomes of dynamic type. Today I want to describe one of the least well understood aspects of method type inference, which also uses a contagion model when dynamic gets involved. Continue reading

Is C# a strongly typed or a weakly typed language?

Presented as a dialogue, as is my wont!

Is C# a strongly typed or a weakly typed language?

Yes.

That is unhelpful.

I don’t doubt it. Interestingly, if you rephrased the question as an “and” question, the answer would be the same.

What? You mean, is C# a strongly typed and a weakly typed language?

Yes, C# is a strongly typed language and a weakly typed language.

I’m confused.

Me too. Perhaps you should tell me precisely what you mean by “strongly typed” and “weakly typed”.

Um. I don’t actually know what I mean by those terms, so perhaps that is the question I should be asking. What does it really mean for a language to be “weakly typed” or “strongly typed”?

“Weakly typed” means “this language uses a type verification system that I find distasteful“, and “strongly typed” means “this language uses a type system that I find attractive“.

No way!

Way, dude.

Really?

These terms are meaningless and you should avoid them. Wikipedia lists eleven different meanings for “strongly typed”, several of which contradict each other. Any time two people use “strongly typed” or “weakly typed” in a conversation about programming languages, odds are good that they have two subtly or grossly different meanings in their heads for those terms, and are therefore automatically talking past each other.

But surely they mean something other than “unattractive” or “attractive”!

I do exaggerate somewhat for comedic effect. So lets say: a more-strongly-typed language is one that has somerestriction in its type system that a more-weakly-typed language it is being compared to lacks. That’s all you can really say without more context.

How can I have sensible conversations about languages and their type systems then?

You can provide the missing context. Instead of using “strongly typed” and “weakly typed”, actually describe the restriction you mean. For example, C# is for the most part a statically typed language, because the compiler determines facts about the types of every expression. C# is for the most part a type safe language because it prevents values of one static type from being stored in variables of an incompatible type (and other similar type errors). And C# is for the most part memory safe language because it prevents accidental access to bad memory.

Thus, someone who thinks that “strongly typed” means “the language encourages static typing, type safety and memory safety in the vast majority of normal programs” would classify C# as a “strongly typed” language. C# is certainly more strongly typed than languages that do not have these restrictions in their type systems.

But here’s the thing: because C# is a pragmatic language there is a way to override all three of those safety systems. Cast operators and “dynamic” in C# 4 override compile-time type checking and replace it with runtime type checking, and “unsafe” blocks allow you to turn off type safety and memory safety should you need to. Someone who thinks that “strongly typed” means “the language absolutely positively guarantees static typing, type safety and memory safety under all circumstances” would quite rightly classify C# as “weakly typed”. C# is not as strongly typed as languages that do enforce these restrictions all the time.

So which is it, strong or weak? It is impossible to say because it depends on the point of view of the speaker, it depends on what they are comparing it to, and it depends on their attitude towards various language features. It’s therefore best to simply avoid these terms altogether, and speak more precisely about type system features.


Next time on FAIC: What happens when a dynamic call’s method group has a single member?

Should C# warn on null dereference?

As you probably know, the C# compiler does flow analysis on constants for the purposes of finding unreachable code. In this method the statement with the calls is known to be unreachable, and the compiler warns about it.

const object x = null; 
void Foo() 
{
  if (x != null)
  {
     Console.WriteLine(x.GetHashCode());   
  }
}

Now suppose we removed the if statement and just had the call:

const object x = null; 
void Foo() 
{   
  Console.WriteLine(x.GetHashCode()); 
}

The compiler does not warn you that you’re dereferencing null! The question is, as usual, why not?
Continue reading