About ericlippert

http://ericlippert.com

Nullable micro-optimization, part four

Posted on January 7, 2013 by ericlippert

Last time on FAIC I described how the C# compiler elides the conversion from int to int? when you add an int? to an int, and thereby manages to save unnecessary calls to HasValue and GetValueOrDefault(). Today I want to talk a bit about another kind of nullable conversion that the compiler can optimize. Consider the following, in which w is an expression of type int? :

double? z = w;

There is an implicit conversion from int to double, and so there is a “lifted” conversion from int? to double?. As I’m sure you’d expect, given the previous entries in this series, this would be code-generated the same as:

double? z;
int? temp = w;
z = temp.HasValue ? 
    new double?((double)temp.GetValueOrDefault()) : 
    new double?();

If you don’t know anything more about w then that’s about as good as it gets. But suppose we did know more. For example, suppose we have:

double? z = new int?();

That might seem crazy, but bear with me. In this case, obviously the compiler need not ever call HasValue in the first place because you and I both know it is going to be false. And we know that there are no side effects of the expression that need to be preserved, so the compiler can simply generate:

double? z = new double?();

Similarly, suppose we have an expression q of type int, and the assignment:

double? z = new int?(q);

Again, clearly we do not need to go through the rigamarole of making a temporary and checking to see if its HasValue property is true. We can skip straight to:

double? z = new double?((double)q);

So this is all well and good. The Roslyn and “original recipe” C# compilers both perform these optimizations. But now let’s think about a trickier case. Suppose we have expressions x and y both of type int?, and suppose for the sake of argument that we do not know anything more about the operands:

double? z = x + y;

Now, reason like the compiler. We do not know whether x and y have values or not, so we need to use the un-optimized version of addition. So this is the same as:

double? z;
int? temp1 = x;
int? temp2 = y;
int? sum = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : 
  new int?();
z = (double?)sum;

We don’t know whether sum has a value or not, so we must then generate the full lifted conversion, right? So this is then generated as:

double? z;
int? temp1 = x;
int? temp2 = y;
int? sum = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) : 
  new int?();
z = sum.HasValue ? 
  new double?((double)sum.GetValueOrDefault()) :
  new double?()

Is that the best we can do? No! The key insight here is that the conversion can be distributed into the consequence and alternative of the conditional, and that doing so enables more optimizations. That is to say that:

z = (double?) (temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : 
  new int?());

Gives the exact same result as:

z = temp1.HasValue & temp2.HasValue ? 
  (double?) new int?(temp1.GetValueOrDefault()+ temp2.GetValueOrDefault()) : 
  (double?) new int?();

But we already know how to optimize those! I said above that only crazy people would convert new int?() to double?, and of course you would not do that in your user-written code. But when the compiler itself generates that code during an optimization, it can optimize it further. The compiler generates a lifted conversion from a lifted arithmetic expression by distributing the conversion into both branches of the conditional, and then optimizes each branch. Therefore, double? z = x + y; is actually generated as:

double? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ? 
  new double?((double)(temp1.GetValueOrDefault() + temp2.GetValueOrDefault())) : 
  new double?();

The compiler does not need to generate the sum variable at all, and it certainly does not need to check to see if it has a value. This optimization eliminates one entire temporary and the entire second conditional expression.

Next time on FAIC: We’ll digress for some brief news on the publishing front. We’ll then continue this series and ask: are there other “chained” lifted operations that can be optimized?

Nullable micro-optimization, part three

Posted on January 3, 2013 by ericlippert

Happy New Year all; I hope you had as pleasant a New Year’s Eve as I did.

Last time on FAIC I described how the C# compiler first uses overload resolution to find the unique best lifted operator, and then uses a small optimization to safely replace a call to Value with a call to GetValueOrDefault(). The jitter can then generate code that is both smaller and faster. But that’s not the only optimization the compiler can perform, not by far. To illustrate, let’s take a look at the code you might generate for a binary operator, say, the addition of two expressions of type int?, x and y:

int? z = x + y;

Last time we only talked about unary operators, but binary operators are a straightforward extension. We have to make two temporaries, so as to ensure that side effects are executed exactly once:

int? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

A brief aside: shouldn’t that be temp1.HasValue && temp2.HasValue?

Both versions give the same result; is the short circuiting one more efficient? Not necessarily! AND-ing together two bools is extremely fast, possibly faster than doing an extra conditional branch to avoid what is going to be an extremely fast property lookup. And the code is certainly smaller. Roslyn uses non-short-circuiting AND, and I seem to recall that the earlier compilers do as well.

Anyway, when you do a lifted addition of two nullable integers, that’s the code that the compiler generates when it knows nothing about either operand. Suppose however that you added an expression q of type int? and an expression r of type int

int? s = q + r;

OK, reason like the compiler here. First off, the compiler has to determine what the addition operator means, so it uses overload resolution and discovers that the unique best applicable operator is the lifted integer addition operator. Therefore both operands have to be converted to the operand type expected by the lifted operator, int?. So immediately we have determined that this means:

int? s = q + (int?)r;

Which of course is equivalent to

int? s = q + new int?(r);

And now we have an addition of two nullable integers. We already know how to do that, so the compiler generates:

int? s;
int? temp1 = q;
int? temp2 = new int?(r);
s = temp1.HasValue & temp2.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
  new int?();

And of course you are saying to yourself well that’s stupid. You and I both know that temp2.HasValue is always going to be true, and that temp2.GetValueOrDefault() is always going to be whatever value r had when the temporary was built. The compiler can optimize this to:

int? s;
int? temp1 = q;
int temp2 = r;
s = temp1.HasValue ? 
  new int?(temp1.GetValueOrDefault() + temp2) :
  new int?();

Just because the conversion from int to int? is required by the language specification does not mean that the compiler actually has to generate code that does it; rather, all the compiler has to do is generate code that produces the correct results!

A fun fact is that the Roslyn compiler’s nullable arithmetic optimizer actually optimizes it to temp1.HasValue & true ? ..., and then Roslyn’s regular Boolean arithmetic optimizer gets rid of the unnecessary operator. It was easier to write the code that way than to be super clever in the nullable optimizer.

Roslyn will also optimize lifted binary operator expressions where both sides are known to be null, where one side is known to be null, and where both sides are known to be non-null. Since these scenarios are rare in user-written code, I’m not going to discuss them in this series.

Next time on FAIC: What happens when we throw some lifted conversions into the mix?

Nullable micro-optimization, part two

Posted on December 27, 2012 by ericlippert

I hope you’ve all had a pleasant Christmas; I sure did, though once again I was unable to return to Waterloo region to visit my family. Hopefully I’ll make it for Easter this coming year.

Last time on FAIC I described why calling GetValueOrDefault() instead of Value allows the jitter to generate smaller, faster code. Of course this optimization is first, tiny, and second, only a valid optimization in the case where you are certain that the nullable value is not actually null. Over the next few episodes I’ll describe how the C# compiler uses that fact to generate better code for you, but in order to do that, I first need to talk a bit about lifted arithmetic.

Back in 2007 I described what mathematicians mean by “lifted arithmetic”, and how the C# specification uses this term in a subtly wrong way. It’s been a long time, so here’s a quick refresher. Mathematically, by “lifted” we mean that if there is a function f : S → S, and we make a new set S' = S ∪ { null }, then the lifted function f' : S' → S' is defined as f'(null) → null, f'(s ∈ S) → f(s). Or, in English, the lifted function gives null when given null, and agrees with the unlifted function otherwise.

We then extend the definition of “lifted” to functions of the form f : S → T in the obvious manner: the lifted function is f' : S' → T'. Similarly for functions of two, three or more parameters: the lifted function is null if any argument is null, and agrees with the unlifted function otherwise.

Lifted arithmetic operators in C# work similarly. In C#, if there is an operator, let’s say the unary ~ operator that takes an int and produces an int, then there is also a lifted ~ operator that takes an int? and produces an int?. The lifted operator produces null if given null, and otherwise agrees with the unlifted operator.

Some so-called “lifted” operators do not follow this pattern, but for the purposes of this series we’ll mostly be talking about the ones that do.

I want to make a brief aside here to discuss how the C# compiler knows to use a lifted operator in the first place. The answer is straightforward: it uses overload resolution.

Continuing our example, when you say ~x, the compiler pretends that you did a method call operator~(x) and creates a candidate set that consists of “methods” corresponding to the signatures of the user-defined and built-in ~ operators. If overload resolution produces a unique best applicable operator then it is chosen and the operand is implicitly converted to the “parameter type” of the chosen “operator method”, otherwise the compiler produces an error. That’s an oversimplification; consult the specification for the exact details.

Unfortunately, the specification sections on operator overload resolution are not strictly speaking entirely accurate: there are some known discrepancies between the compiler and the specification. In some of these cases the compiler is wrong and in some the specification is wrong. The areas with small discrepancies include (1) precisely when a user-defined operator is considered to be “liftable” and what the resulting semantics are, (2) how the candidate set for operators on enumerated and delegate types are determined, and (3) how the “betterness” rules treat lifted operators.

Mads and I have a number of times attempted to come up with better spec language but I don’t think the proposed changes made it into the latest revision. I might choose to do blog articles on these interesting and difficult corner cases in the future.

The important fact that will come into play later in this series is that if overload resolution chooses a lifted operator then the operand is implicitly converted to the nullable type. Just like how when normal overload resolution chooses a method, the arguments are implicitly converted to the corresponding formal parameter types.

Returning now to the subject at hand: how does the C# compiler generate code for a lifted operator? When you say:

int? y = ~x;

what happens? Let’s suppose that x is a legal expression of type int?, just to keep it easy. Overload resolution determines that the lifted ~ operator that takes an int? and produces an int? is the unique best applicable operator. The expression is already of the correct type. Now, you might naively think that the compiler would pretend that you’d typed:

int? y = x.HasValue ? ~x.Value : null;

but of course that code is wrong in two ways.

First, it doesn’t compile because the type of the conditional operator expression cannot be determined.

Astonishingly, I’ve never written a blog article about this specific aspect of the conditional operator, though it has certainly come up on StackOverflow a lot. This is probably the blog article that came the closest to describing this common problem.

And second, what if the expression x has a side effect? We would not want to generate

int? y = ~M(++i);

as:

int? y = M(++i).HasValue ? ~M(++i).Value : null;

because then the variable gets incremented twice and the method gets called twice if the result of the first call is not null. And of course the value returned the second time might be different! We can fix these two problems easily enough:

int? y;
int? temp = x;
y = temp.HasValue ? new int?(~temp.Value) : new int?();

And now we’re good.

At this point the C# compiler can say “but wait a moment! if we are on the “consequence” branch of the conditional operator then we know for sure that temp.HasValue is true. Therefore the compiler can generate the more optimal code:

int? y;
int? temp = x;
y = temp.HasValue ? new int?(~temp.GetValueOrDefault()) : new int?();

Which is in fact what both the “original recipe” and the “extra crispy Roslyn” compilers do. The savings is tiny, but it is real, and these savings add up as the expressions get more and more complicated, as we’ll see.

Next time on FAIC: Is that the only optimization a C# compiler can perform when generating code for lifted arithmetic? Of course not! In the next few episodes we’ll look at some ways the compiler can be more clever, and compare the Roslyn compiler’s heuristics to the previous compiler’s heuristics. Happy New Year all, and we’ll see you in 2013 for more fabulous adventures.

Nullable micro-optimizations, part one

Posted on December 20, 2012 by ericlippert

Which is faster, Nullable<T>.Value or Nullable<T>.GetValueOrDefault()?

Before I answer that question, my standard response to “which horse is faster?” questions applies. Read that first.

.
.
.

Welcome back. But again, before I answer the question I need to point out that the potential performance difference between these two mechanisms for obtaining the non-nullable value of a nullable value type is a consequence of the fact that these two mechanisms are not semantically equivalent. The former may legally only be called if you are sure that the nullable value is non-null; put another way, calling Value without knowing that HasValue is true is a boneheaded exception. The latter may be called on any nullable value. A glance at a simplified version of the source code illustrates the difference.

struct Nullable<T> where T : struct
{
  private bool hasValue;
  private T value;
  public Nullable(T value)
  {
    this.hasValue = true;
    this.value = value;
  }
  public bool HasValue { get { return this.hasValue; } }
  public T Value
  {
    get
    {
      if (!this.HasValue) throw something;
      return this.value;
    }
  }
  public T GetValueOrDefault() 
  {
    return this.value; 
  }
  ... and then all the other conversion gear and so on ...
}

The first thing to notice is that a nullable value type’s ability to represent a “null” integer or decimal or whatever is not magical. (Nullable value types are magical in other ways; for example, there’s no way to write your own struct that has the strange boxing behaviour of a nullable value type; an int? boxes to either an int or null, never to a boxed int?. But let’s not worry about these magical features today.) A nullable value type is nothing more than an instance of the value type plus a bool saying whether it’s null or not.

If a variable of nullable value type is initialized with the default constructor then the hasValue field will be its default value, false, and the value field will be default(T). If it is initialized with the declared constructor then of course the hasValue field is true and the value field is any legal value, including possibly T‘s default value. Thus, the implementation of GetValueOrDefault() need not check the flag; if the flag is true then the value field is set correctly, and if it is false, then it is set to the default value of T.

Looking at the code it should be clear that Value is almost certainly not faster than GetValueOrDefault() because obviously the former does exactly the same work as the latter in the success case, plus the additional work of the flag check. Moreover, because GetValueOrDefault() is so brain-dead simple, the jitter is highly likely to perform an inlining optimization.

An inlining optimization is where the jitter eliminates an unnecessary “call” and “return” instruction by simply generating the code of the method body “inline” in the caller. This is a great optimization because doing so can make code both smaller and faster in some cases, though it does make it harder to debug because the debugger has no good way to generate breakpoints inside the inlined method.

How the jitter chooses to inline or not is an implementation detail, but it is reasonable to assume that it is less likely to perform an inlining optimization on code that contains more than one “basic block” and has a throw in it.

A “basic block” is a region of code where you know that the code will execute from the top of the block to the bottom without any “normal” branches in or out of the middle of the block. (A basic block may of course have exceptions thrown out of it.) Many optimizing compilers use “basic blocks” as an abstraction because it abstracts away the unnecessary details of what the block actually does, and treats it solely as a node in a flow control graph.

It should also be clear that though the relative performance difference might be large, the absolute difference is small. A call, field fetch, conditional jump and return in the typical case makes up the difference, and those things are each only nanoseconds.

Now, this is of course not to say that you should willy-nilly change all your calls to Value to GetValueOrDefault() for performance reasons. Read my rant again if you have the urge to do that! Don’t go changing working, debugged, tested code in order to obtain a performance benefit that is (1) highly unlikely to be a real bottleneck, and (2) highly unlikely to be your worst performance problem.

And besides, using Value has the nice property that if you have made a mistake and fetched the value of a null, you’ll get an exception that informs you of where your bug is! Code that draws attention to its faults is a good thing.

Finally, I note that here we have one of those rare cases where the frameworks design guidelines have been deliberately bent. We have a “Get” method is actually faster than a property getter, and the property getter throws! Normally you expect the opposite: the “Get” method is usually the one that is slow and can throw, and the property is the one that is fast and never throws. Though this is somewhat unfortunate, remember, the design guidelines are our servants, not our masters, and they are guidelines, not rules.

Next time on FAIC: How does the C# compiler use its knowledge of the facts discussed today to your advantage? Have a great Christmas everyone; we’ll pick up this subject again in a week.

Which is faster?

Posted on December 17, 2012 by ericlippert

Which is faster, QueryLightBulbFrobStatusEx() or __WGetBulbFrobberState2()?

Hold it right there, buddy. Before answering that question I must give you my standard six-part rant about why I probably cannot sensibly answer questions that begin “which is faster“.

Part the first: Why are you even asking me?

If you have two horses and you want to know which of the two is the faster then race your horses. Don’t write short descriptions of the horses, post them on the Internet, and ask random strangers to guess which is faster! Even if by sheer chance you got an accurate answer, how would you have any confidence in its accuracy? You can easily and accurately discover which of two programs is faster by running both yourself and measuring them with a stopwatch.

(Caveat: writing performance benchmarks that provide high-quality, meaningful results can be somewhat tricky in a garbage-collected, JIT-compiled runtime. I see this done wrong a lot. I will return to this topic in a future blog post.)

Moreover: performance is highly sensitive to things like what hardware you are using, what software you have installed, what other processes are doing, and a host of other factors. You’re the only person who knows what environment the code is going to be run in, so you’re the only person who can do realistic performance testing in that environment.

Part the second: Do you really need to answer that question?

The question presupposes that there actually is a performance problem to be solved. If the code as it stands — the working, debugged, tested code — is already fast enough for your customer then knowing which of two ways to write the code is the faster is just trivia. Spend your valuable time worrying about something else, like testing, robustness, security, and so on. Unnecessary code changes are expensive and dangerous; don’t make performance-based changes unless you’ve identified a performance problem.

Part the third: Is that really the bottleneck?

Suppose that you really do have an identified performance problem: asking which of two things is faster is premature if neither thing is the cause of the problem! The most important thing that I’ve learned about performance analysis is that my highly educated and experienced guess about the root cause of a performance problem is dead wrong probably more than a third of the time. Use a profiler or other analysis tool to determine empirically where the bottleneck is before you start investigating alternatives.

Part the fourth: Is the difference relevant?

Suppose that you have actually identified a bottleneck: now the relevant question is not actually “which horse is faster?” Rather, the relevant question is actually “are either of these horses fast enough to meet my customer’s needs?” If neither horse is fast enough for your purposes then knowing which is faster is irrelevant. And if both are fast enough then you can base your decision on important factors other than performance, as discussed in part the second.

(The question also presupposes that the two alternatives proposed are semantic equivalents. If one of those is a horse and the other is a refrigerator, asking which one runs faster is maybe a non-starter.)

Part the fifth: What is this “faster” you speak of?

There are lots of kinds of speed, and optimizing for one kind can deoptimize for another. I’m sure you’ve all encountered situations where normal-case-scenario performance is acceptable but worst-case-scenario performance is terrible. Anyone who has been unable to get a long-distance phone call placed on Mother’s Day knows what I’m talking about; the telephone network was designed to handle slightly-higher-than-average load, not highest-likely load. Unfortunately, implementing code that ensures an upper bound on your worst-possible-scenario behaviour often ends up making the typical-case-scenario unacceptably slower, and vice versa.

Leaving the difference between best, worst and typical aside, there are all kinds of speed metrics. When I was working on Visual Studio Tools For Office we did comparatively little making the customization framework code run faster because our tests showed that it typically ran fast enough to satisfy customers. But we did an enormous amount of work making the framework code load faster, because our research showed that Office power users were highly irritated by noticable-by-humans delays when loading customized documents for the first time. Similarly, in the web services realm there is a big difference between optimizing for time-to-first-byte, time-to-last-byte and throughput. You have to know what kind of speed is really important to the customer.

Part the sixth: Are you looking at the big picture?

Almost all performance analysis questions I see are solely about improving speed. There are lots of non-speed metrics like memory usage, disk usage, network usage, processor usage, and so on, that might be more relevant than raw speed to your customer. Making code faster often involves trading less time for more memory, which might be a bad trade. Don’t over-focus on speed metrics; look at the big picture.

Well, that rant used up this whole episode. Next time on FAIC: Which is faster, Nullable<T>.Value or Nullable<T>.GetValueOrDefault()?

The photo credits are on Wikimedia Commons.

My Kauai vacation

Posted on December 14, 2012 by ericlippert

No technology today; just some photos I took on my recent trip to Kauai. (Click on the small photos for a larger version of each.)

Kauai is the oldest of the Hawaiian islands and has fabulous topography and rich soil as a result of its violent five-million year history of repeated volcanic eruptions followed by heavy erosion. A few of the highlights:

The Allerton Garden on the south shore is an amazing collection of native, endemic and exotic (that is, introduced recently) plants artfully arranged and carefully tended. My favourite arrangement highlighting a single tree was this one:

The Allerton Garden is also the home of the famous ficus trees seen in Jurassic Park:

To get a sense of the scale of those amazing roots and for some more background on these incredible trees, check out this little tourism video:

Kauai tops out at 1600 metres today; it was far, far higher than that when it originally formed. The immense erosion has produced the “Grand Canyon of the Pacific”, Waimea Canyon, on the interior:

Of course each horizontal line you can see in the eroded layer is an individual lava flow. On the exterior the vulcanism and erosion has produced the Na Pali cliffs. (*) Here you can see an interesting feature: a sea cave with a tiny waterfall going over it. This was useful because you could stock up on fresh water without ever beaching your canoe!

All in all it was a lovely vacation, both relaxing and educational. I hope to some day go back and experience the north side of the island.

I’ve used some of the photos above as the header images for the blog; if you’re interested in seeing the full-size versions of rest of the header images, see the photo credits page.

Next time on FAIC: Why it is very hard to give a sensible answer to “which is faster?” questions.

(*) Na Pali means “many cliffs”, so those would be the “many cliffs cliffs”. The Microsoft cafeteria once offered a sandwich “with au jus sauce”, which is even worse.

Taking responsibility

Posted on December 11, 2012 by ericlippert

Today I answer the question “what’s the deal with the fixed statement?” in the form of a dialogue, as is my wont. So:

What’s the deal with the fixed statement?

As I noted back in 2009, the purpose of the fixed statement is to tell the garbage collector that your code has made an unsafe, unmanaged pointer into a block of managed memory. Since the garbage collector reserves the right to move that memory around, it is important that you inform the garbage collector that it needs to “pin in place” that memory until you tell it otherwise.

Suppose I am calling unmanaged code from my C# program and I need to pass the code a pointer to a managed array. Eventually control will leave the fixed statement; what if the unmanaged code holds onto that pointer and uses it after the memory becomes unpinned?

Describing what happens in that scenario is not interesting because you are required to not get into that situation in the first place. As the C# specification helpfully points out:

It is the programmer’s responsibility to ensure that pointers created by fixed statements do not survive beyond execution of those statements. For example, when pointers created by fixed statements are passed to external APIs, it is the programmer’s responsibility to ensure that the APIs retain no memory of these pointers.

If you abdicate that responsibility then arbitrarily bad things can happen to your computer; the program can literally do anything that the current process has the right to do, including erasing all your files.

So what if I do that anyway? How do I prevent that undefined behaviour?

If it hurts when you do that then don’t do that. Asking “how do I not die from fatally shooting myself?” is a non-starter; don’t fatally shoot yourself in the first place if you’d prefer to not die!

No, really, I need to solve this problem! I really do have unmanaged code that captures the pointers I hand to it and dereferences them at an unknown time in the future. What can I do that is responsible?

There are a number of ways to mitigate this terrible situation.

First, you could ensure that control never leaves the fixed block. This is essentially throwing a wrench into the GC performance and also makes it quite difficult to write your program, so I don’t recommend it.

Second, you could make a GCHandle object and use it to pin the array in place. It will stay pinned until you free the handle. This will, again, throw a wrench into the garbage collector because there will now be a pinned block that cannot move; the garbage collector will literally have to work around it.[1. To mitigate the performance problem you could make the array really big. Large arrays go on a large object heap, which is not compacted like the regular heap is, so the penalty of having an immovable block in the middle of the heap goes away. Of course, making an array far, far larger than it needs to be in order to solve a performance problem is likely to cause performance problems of its own. And also, the behaviour of the large object heap is an implementation detail subject to change at any time, not a contract you can rely on. This is, again, probably a bad idea, but it will work.]

Third, you could allocate the array out of fixed-in-place unmanaged storage in the first place. For example, you could use AllocHGlobal and FreeHGlobal to do your own memory management. That’s what I’d probably do if faced with this unfortunate situation.

Next time on FAIC: What I did on my Kauai vacation.

Why are braces required in try-catch-finally?

Posted on December 4, 2012 by ericlippert

Developers who use C-like languages typically conceive of if, while, for, and so on as taking either a single statement, or a group of any number of statements in a block:

if (x)
  M();

if (x)
{
  M();
  N();
}

However, that’s not how programming language designers think of it. Rather, if and while and for and so on each take a single statement, and a braced block is a single statement.[1. C# has an additional rule that the statement in an if, while and so on may not be a single local variable declaration; that’s a good subject for another day.]

No matter how we choose to think about the grammar, it is certainly the case that try-catch-finally is different than if and while and for and so on; try-catch-finally requires a braced block. That seems inconsistent; is there a justification for this inconsistency?

Before we dig into the try-catch-finally case, let’s first consider the problems with this approach. The looping structures are unambiguous:

while(A())
  while(B())
    C();

The inner while statement composes nicely with the outer while statement; no braces are required to make sense of this. But that is not the case with if, thanks to the famous “dangling else problem”:

if (A())
  if (B())
    C();
else
  D();

OK, quick, is the indenting correct there? Is else D() associated with the inner if statement or the outer one?

It’s associated with the inner one; the else matches the nearest containing if. But in a language where whitespace doesn’t matter, it is very easy to accidentally indent this wrong and get the wrong impression when reading the code. I’ve also seen badly-written macros in C and C++ that caused the dangling-else problem to arise.

When adding try-catch-finally to the language, the designers wished to avoid adding a second kind of dangling else problem. Suppose that you could put any statement after a try, catch or finally, rather than having to put a block statement. How do you analyze this program fragment?

try
  try
    A();
  catch AException
    B();
catch BException
  C();

OK, quick, is the indenting correct there? Is B() protected? That is, should we parse this as

try
{
  try
  {
    A();
  }
  catch AException
  {
    B(); // protected by the outer try
  }
}
catch BException
{
  C();
}

Or is it this erroneous program?

try // try without associated catch!
{
  try
  {
    A();
  }
  catch AException
  {
    B(); // not protected
  }
  catch BException
  {
    C();
  }
}

Rather than attempt to come up with a rule to disambiguate the ambiguous parse, it is better to simply avoid the ambiguity altogether and require the braces. The last thing we need in this language is more ambiguity.

While we’re on the subject, an interesting thing about the try block is that of course the try keyword is completely unnecessary from a grammatical perspective. We could simply have said that any block can be followed by any number of catch blocks or a finally block, and the block thus followed is implicitly a try block. This is a good example of how enforcing redundancy into the language makes it more readable; the try keyword calls the reader’s attention to the fact that the control flow of this part of the method needs to deal with exceptional situations.

And one additional fun fact: in the initial design of C#, there was no such thing as try-catch-finally. There was try-catch and try-finally. If you wanted to have a try-catch-finally then you’d write:

try
{
  try
  {
    A();
  }
  catch AException
  {
    B();
  }
}
finally
{
  C();
}

The language designers realized that this was a common pattern and unnecessarily wordy, so they allowed the syntactic sugar of eliminating the outer try. The C# compiler actually generates the code as though you’d written the nested blocks, since at the CIL level there is no try-catch-finally.

Eric is on vacation; this posting was pre-recorded.

Next time on FAIC: What happens when unmanaged code holds on to a fixed pointer? Nothing good.

Fabulous adventures

Posted on November 29, 2012 by ericlippert

Hello world, this is the new home of Fabulous Adventures in Coding. (The previous site is here.) Long-time readers will need no introduction, but if you are new here, please check out this short bio.

Today, November 29th 2012, is as I noted in my final post on the MSDN blog, my second-last day at Microsoft. After tomorrow I will be taking the next few weeks off and not thinking about programming languages for once. And after that, I’m starting a new gig in 2013 at Coverity.

Most of you probably have not heard of Coverity, but you have almost certainly used software that was affected by their tools. Coverity makes static analysis tools for software developers; these tools analyze source code written in C, C++, Java and C# and tell you about correctness and security issues before they ship to customers. Among their high-profile customers are the Jet Propulsion Lab team that wrote the software for the Curiosity rovers now running around on Mars and the software team for the Large Hadron Collider, which recently confirmed the existence of the Higgs Boson. They also serve more down-to-earth customers; it’s not all weird science.

As an expert on the design and implementation of static analyzers for C# code — because, after all, that’s what the compiler is! — the opportunity to work in downtown Seattle on a small team to improve the C# analysis product was too good to pass up. And so here I am, continuing to try to improve the tools available for C# programmers.

Though I am no longer an “insider” on the C# design team, I intend to continue to blog about the design and implementation of C#, as well as other fabulous adventures in coding. If this sort of thing interests you, please subscribe to the RSS feed at ericlippert.com/feed, and please follow me on Twitter where I am @ericlippert.

Once I’m back from my short vacation we’ll get right back into it. Thanks for reading, and I look forward to sharing more fabulous adventures with you.

Next time on FAIC: Why are the bracing rules inconsistent in C#?

Why is deriving a public class from an internal class illegal?

Posted on November 13, 2012 by ericlippert

In C# it is illegal to declare a class D whose base class B is in any way less accessible than D. I’m occasionally asked why that is. There are a number of reasons; today I’ll start with a very specific scenario and then talk about a general philosophy.

Suppose you and your coworker Alice are developing the code for assembly Foo, which you intend to be fully trusted by its users. Alice writes:

public class B
{
  public void Dangerous() {...}
}

And you write

public class D : B
{
  ... other stuff ...
}

Later, Alice gets a security review from Bob, who points out that method Dangerous could be used as a component of an attack by partially-trusted code, and who further points out that customer scenarios do not actually require B to be used directly by customers in the first place; B is actually only being used as an implementation detail of other classes. So in keeping with the principle of least privilege, Alice changes B to:

internal class B
{
  public void Dangerous() {...}
}

Alice need not change the accessibility of Dangerous, because of course public means “public to the people who can see the class in the first place”.

So now what should happen when Alice recompiles before she checks in this change? The C# compiler does not know if you, the author of class D, intended method Dangerous to be accessible by a user of public class D. On the one hand, it is a public method of a base class, and so it seems like it should be accessible. On the other hand, the fact that B is internal is evidence that Dangerous is supposed to be inaccessible outside the assembly. A basic design principle of C# is that when the intention is unclear, the compiler brings this fact to your attention by failing. The compiler is identifying yet another form of the Brittle Base Class Failure, which long-time readers know has shown up in numerous places in the design of C#.

Rather than simply making this change and hoping for the best, you and Alice need to sit down and talk about whether B really is a sensible base class of D; it seems plausible that either (1) D ought to be internal also, or (2) D ought to favour composition over inheritance. Which brings us to my more general point:

More generally: the inheritance mechanism is, as we’ve discussed before, simply the fact that all heritable members of the base type are also members of the derived type. But the inheritance relationship semantics are intended to model the “is a kind of” relationship. It seems reasonable that if D is a kind of B, and D is accessible at a location, then B ought to be accessible at that location as well. It seems strange that you could only use the fact that “a Giraffe is a kind of Animal” at specific locations.

In short, this rule of the language encourages you to use inheritance relationships to model the business domain semantics rather than as a mechanism for code reuse.

Finally, I note that as an alternative, it is legal for a public class to implement an internal interface. In that scenario there is no danger of accidentally exposing dangerous functionality from the interface to the implementing type because of course an interface is not associated with any functionality in the first place; an interface is logically “abstract”. Implementing an internal interface can be used as a mechanism that allows public components in the same assembly to communicate with each other over “back channels” that are not exposed to the public.

Fabulous adventures in coding

Eric Lippert's blog

Author Archives: ericlippert