What the meaning of is is

Today a follow-up to my 2010 article about the meaning of the is operator. Presented as a dialog, as is my wont!

I’ve noticed that the is operator is inconsistent in C#. Check this out:

string s = null; // Clearly null is a legal value of type string
bool b = s is string; // But b is false!

What’s up with that?

Let’s suppose you and I are neighbours.

Um… ok, I’m not sure where this is going, but sure.

Continue reading

Benchmarking mistakes, part two

NOTE: I wrote this series of articles for a web site which went out of business soon thereafter. They asked for a beginner-level series on how to benchmark C# programs. I’ve copied the articles as I wrote them below. Part one is here and part three is here.


So far in this series we’ve learned about the jitter and how it compiles each method in your program “on the fly”; we’ll come back to jitter issues in the next episode; in this episode I want to look at some actual (terrible) code for measuring performance.

Let’s suppose part of a larger program will involve sorting a long list of integers, and we’d like to know how much time that is going to take. Rather than testing the entire “real” program and trying to isolate the specific part, we’ll write a benchmark that just answers the question at hand.

There are a lot of things wrong with the code below that we’ll get to in later episodes; the issue I’m going to focus on today is:

Mistake #5: Using a clock instead of a stopwatch.

using System;
using System.Collections.Generic;
class P
{
  public static void Main()
  {
    var list = new List<int>();
    const int size = 10000;
    var random = new Random();
    for(int i = 0; i < size; i += 1)
      list.Add(random.Next());
    var startTime = DateTime.Now;
    list.Sort();
    var stopTime = DateTime.Now;
    var elapsed = stopTime - startTime;
    Console.WriteLine(elapsed);
  }
}

That looks pretty reasonable, right? We create some random data to test. Let’s assume that ten thousand elements is about the typical problem size that the customer will see. We are careful to time only the actual sort and not include the creation of the random data. If we run it on my laptop I see that it reports about 3 milliseconds. How accurate is this?

That result is off by a factor of more than two from the correct answer; the sort actually takes about 1.2 millseconds. What explains this enormous discrepancy, and how did I know what the correct answer was?

Well, suppose you were trying to time this program by actually watching a grandfather clock that ticks once per second. This is of course ludicrous because you know what would happen: either the clock would tick during the few milliseconds the program was active, or it wouldn’t. If it did then you’d record one second, if not, then you’d record zero. The resolution of the clock is nowhere near good enough to actually measure the elapsed time with the necessary precision.

DateTime.Now is only slightly better than that grandfather clock: it only ticks every few milliseconds. Worse, the rate at which it ticks is not fixed from machine to machine. On modern Windows machines you can expect that the tick resolution will be around 100 ticks per second or better under normal circumstances, which is a resolution of 10 milliseconds or less. Apparently things went well for me on this laptop and I got a resolution of 3 milliseconds, which is great for DateTime.Now but still nowhere near fast enough to accurately measure the time spent sorting.

The correct tool to use when writing benchmarks is Stopwatch in the System.Diagnostics namespace. (I note that this namespace is well-named; everything in here is good for diagnosing problems but should probably not be used in “production” code.) Let’s take a look at the code written to use Stopwatch instead: (There is still plenty wrong with this benchmark, but at least we’re improving.)

using System;
using System.Collections.Generic;
using System.Diagnostics;
class P
{
  public static void Main()
  {
    var list = new List<int>();
    const int size = 10000;
    var random = new Random();
    for(int i = 0; i < size; i += 1)
      list.Add(random.Next());
    var stopwatch = new System.Diagnostics.Stopwatch();
    stopwatch.Start();
    list.Sort();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
  }
}

Notice first of all how much more pleasant it is to use the right tool for the job; the Stopwatch class is designed for exactly this problem so it has all the features you’d want: you can start, pause and stop the timer, computing the elapsed time does not require a subtraction, and so on.

Second, the results now show far more accuracy and precision: this run took 1.1826 milliseconds on my laptop. We have gone from ~10 millisecond precision to sub-microsecond precision!

How precise is the stopwatch, anyway? It depends on what kind of CPU you have; the Stopwatch class actually queries the CPU hardware to get such high precision. The Stopwatch.Frequency read-only field will tell you what the resolution is; on my laptop it is two million ticks per second, which you’ll note is considerably larger than the operating system’s weak promise of around 100 ticks per second for the grandfather clock that is DateTime.Now. This laptop is pretty low-end; better hardware can provide even higher resolution than that. Remember, one two-millionth of a second is still hundreds of processor cycles on modern hardware, so there’s room for improvement.

Suppose for the sake of argument though that the code we were benchmarking was going to run for seconds, minutes or even hours. In those cases the difference between a resolution of 100 ticks per second and 2 million ticks per second is irrelevant, right? If you’re going to be measuring minutes then you don’t need the second hand to be super accurate, so why not use DateTime.Now in those cases?

It’s still a bad idea because Stopwatch has been specifically designed to solve the problem of easily measuring time spent in code; DateTime.Now was designed to solve a completely different problem, namely, to tell you what time it is right now. When you use the wrong tool for the job, you sometimes can get strange results.

For example, suppose you use DateTime.Now to measure the time in a long-running performance benchmark that you run overnight on a machine in Seattle. Say, on Sunday November 3rd, 2013. The result you get is going to be wrong by one hour, and might in fact even be a negative number because DateTime.Now pays attention to Daylight Savings Time changes.

Just don’t even go there. DateTime.Now is the wrong tool for the job, was designed to solve a different problem, is harder to use than Stopwatch, and has thousands or millions of times less precision. Avoid it entirely when writing benchmarks in C#.

Next time in this series we’ll take a closer look at how the jitter can affect your benchmarks.

What is lexical scoping?

Happy Eliza Doolittle day all; today seems like an appropriate day for careful elocution of technical jargon. So today, yet another question about “scope”. As one of the more over-used jargon terms in programming languages, I get a lot of questions about it.

I’ll remind you all again that in C# the term “scope” has a very carefully defined meaning: the scope of a named entity is the region of program text in which the unqualified name can be used to refer to the entity.[1. Scope is often confused with the closely related concepts of declaration space (the region of code in which no two things may be declared to have the same name), accessibility domain (the region of program text in which a member’s accessibility modifier permits it to be looked up), and lifetime (the portion of the execution of the program during which the contents of a variable are not eligable for garbage collection.)]

Continue reading

Quality assurance fail

PowerSupplySome fun for Friday. I just opened up a box containing a brand-new bit of telecommunications equipment, and the power supply arrived looking like this, fresh out of the box. (Click for a larger version.)

How bad does your quality assurance have to be to ship to customers a power supply that cannot possibly fit into a power socket?

 

Spot the defect: rounding, part two

Last time I challenged you to find a value which does not round correctly using the algorithm

Math.Floor(value + 0.5)

The value which does not round correctly is the double 0.49999999999999994, which is the largest double that is smaller than 0.5. With the given algorithm this rounds up to 1.0, even though clearly 0.49999999999999994 is less than one half, and therefore should round down.

What the heck is going on here?

Continue reading

Benchmarking mistakes, part one

NOTE: I wrote this series of articles for a web site which went out of business soon thereafter. They asked for a beginner-level series on how to benchmark C# programs. I’ve copied the articles as I wrote them below. Part two is here.


In this series of articles, I’m going to go through some of the mistakes I frequently see people making who are attempting to write benchmarks in C#. But before we get into the mistakes, I suppose I should introduce myself and define the term.

Hi, I’m Eric Lippert; I work at Coverity where I design and implement static analyzers to find bugs in C# programs. Before that I was at Microsoft for 16 years working on the C#, VBScript, JScript and Visual Basic compilers, amongst other things. I write a blog about language design issues at EricLippert.com. Many thanks to the editors here at Tech.pro for inviting me to write this series.

OK, let’s get into it. What exactly do I mean when I say benchmark?

The term comes originally from surveying; a surveyor would mark an object that was at an already-known position and then use that mark to determine the relative and absolute positions of other objects. In computing the term, like so much other jargon, has been liberally borrowed and now means pretty much any sort of performance comparison between two alternatives.

Benchmarking is often used to describe the performance of computer hardware: you write a program, compile and execute it on two different computers, and see which one performs better. That’s not the kind of benchmarking I’m going to talk about in this series; rather, I want to talk about performance benchmark tests for software.

I want to clarify that further, starting with the meaning of “performance” itself. The first rule of software metrics is well known: you get what you measure.

If you reward people for making a measurable improvement in memory usage, don’t be surprised if time performance gets worse, and vice versa. If you reward improvement rather than achieving a goal then you can expect that they’ll keep trying to make improvements even after the goal has been achieved (or worse, even if it is never achieved!)

This brings us to our first benchmarking mistake:

Mistake #1: Choosing a bad metric.

If you’ve chosen a bad metric then you’re going to waste a lot of effort measuring and improving an aspect of the software that is not relevant to your users, so choose carefully.

For the rest of this series I’m going to assume that the relevant performance metric that your benchmark measures is average execution time, and not one of the hundreds of potential other metrics, like worse-case time, memory usage, disk usage, network usage, and so on. This is the most common metric for performance benchmarks and hence the one I see the most mistakes in.

I also want to clarify one other thing before we dive in. I’m assuming here that the purpose of the benchmark is to empirically determine the performance of a small part of a larger software project so that an informed decision can be made.

For example, you might have a program that, among its many other tasks, sometimes has to sort a large set of data. The benchmarks I’m talking about in this series are the narrowly targeted tests of, say, half a dozen different sort algorithms to determine which ones yield acceptable performance on typical data; I’m not talking about “end to end” performance testing of the entire application. Often in large software projects the individual parts have good performance in isolation, but bad performance in combination; you’ve got to test both.

That brings us to:

Mistake #2: Over-focusing on subsystem performance at the expense of end-to-end performance.

This series of articles is going to be all about subsystem performance; don’t forget to budget some time for end-to-end testing as well.

So far we’ve seen some very general mistakes; now let’s start to dig into the actual mistakes people make in implementing and executing their subsystem performance benchmarks in C#. The number one most common mistake I see is, no kidding:

Mistake #3: Running your benchmark in the debugger.

This is about the worst thing you can possibly do. The results will be totally unreliable. Think about all the things that are happening when you run a managed program in a debugger that are not happening when your customer runs the program: the CLR is sending information to the debugger about the state of the program, debug output is being displayed, heck, an entire other enormous process is running.

But it gets worse, far worse.

The jit compiler knows that a debugger is attached, and it deliberately de-optimizes the code it generates to make it easier to debug. The garbage collector knows that a debugger is attached; it works with the jit compiler to ensure that memory is cleaned up less aggressively, which can greatly affect performance in some scenarios.

But perhaps I am getting ahead of myself. What is this “jit compiler” thing? In order to make sense of the next episode in this series you’ll need to have a pretty solid understanding of how compilation works in .NET. Here’s the high level view.

Let’s suppose you write some source code in C# using Visual Studio. When you build that project the IDE starts up the C# compiler. A compiler is by definition a program which translates a program written in one language into “the same” program written in another language. The C# compiler translates C# code into a different language, IL, the Intermediate Language. (Also sometimes notated CIL for Common IL or MSIL for Microsoft IL, but we’ll just stick with “IL”.)

IL is a very low-level language designed so that in its compressed binary form it is reasonably compact but also reasonably fast to analyze. A managed assembly (a .exe or .dll file) contains the IL for every method in the project as well as the “metadata” for the project: a compact description of all the classes, structs, enums, delegates, interfaces, fields, properties, methods, events,… and so on in your program.

When you run code in an managed assembly, the Common Language Runtime (CLR) reads the metadata out of the assembly to detemine what the types and methods and so on are. But the real miracle of the CLR is the Just In Time compiler — “the jitter” for short. The jitter is a compiler, so again, it translates from one language to another. The CLR runs the jitter on the IL associated with a method immediately before that method is about to run for the first time — hence the name “Just In Time compiler”. It translates the IL into the machine code that will actually execute on the processor.

So now perhaps it is more clear why understanding the jitter behaviour is so important when benchmarking code; the jitter is dynamically generating the actual machine code on the fly, and therefore determining how heavily optimized that machine code is. The jitter knows whether there is a debugger attached or not, and if there is then it figures it had better not be aggressive about optimizations because you might be trying to inspect the code in the debugger; heavily optimized code is harder to understand. But obviously the unoptimized code will be less performant, and therefore the benchmark is ruined.

Even if you don’t run your benchmark program in the debugger it is still important to make sure that you are not telling the jitter to go easy on the optimizations:

Mistake #4: Benchmarking the debug build instead of the release build.

If you compile a project in Visual Studio in “debug” mode then both the C# compiler and the jit compiler will again deliberately generate less-optimized code even if you run the program outside of the debugger on the assumption that clarity is better than speed when you are attempting to diagnose problems.

And of course the debug version of your program might contain special-purpose code of your own devising to make debugging easier. For example, expensive assertions might be checked which would be ignored in the release build.

Both mistakes #3 and #4 are actually specific versions of a more general mistake: testing the code in an environment radically different from the customer’s environment. The customer is not going to be running the debug version of your product, so don’t test that version. We’ll come back to this point in a later episode.

Next time in this series I’ll talk about mistakes made in specific measurement techniques; after that we’ll take a look at some more subtle ways in which forgetting about the jitter can lead to bad benchmarks.

Spot the defect: rounding

The intention of this method is to round a double to the nearest integer. If the double is exactly half way between two integers then it rounds to the larger of the two possibilities:[1. For negative numbers, -1.5 should round to -1.0, since -1.0 is larger than -2.0; I do not mean larger in the sense of absolute magnitude. That would be characterized as “midpoint rounding away from zero”.]

static double MyRound(double d)
{
  return Math.Floor(d + 0.5);
}

Is it correct? Can you find a value for which it does not give the mathematically correct value?[2. HINT: The value I’m thinking of is small.]

UPDATE: The answer is in the comments, so if you don’t want spoilers, don’t read the comments.

Next time on FAIC: The answer, of course.

I have a mysterious fifth sense (rerun)

Today, another of my ongoing series of reruns of my fun-for-Friday non-computer posts. Here’s one from the dot-com recovery of 2004.


The economy must be picking up — I’m getting cold calls from recruiters again for the first time in about four years.  Today was the second – and third – this month.

However, apparently some of them are just a wee bit disorganized. I just had the following conversations:

[Ring ring]

Me: Hi, this is Eric.

Her: Hi, this is Barbara[1. I learned later from some of my fellow Microsoftie bloggers that it appeared that Barbara was calling everyone at Microsoft with an MSDN blog. She must have been repeatedly calling the switchboard and asking to speak with each.] at XYZ Recruiters. How are you today?

Continue reading