Benchmarking mistakes, part two

NOTE: I wrote this series of articles for a web site which went out of business soon thereafter. They asked for a beginner-level series on how to benchmark C# programs. I’ve copied the articles as I wrote them below. Part one is here and part three is here.

So far in this series we’ve learned about the jitter and how it compiles each method in your program “on the fly”; we’ll come back to jitter issues in the next episode; in this episode I want to look at some actual (terrible) code for measuring performance.

Let’s suppose part of a larger program will involve sorting a long list of integers, and we’d like to know how much time that is going to take. Rather than testing the entire “real” program and trying to isolate the specific part, we’ll write a benchmark that just answers the question at hand.

There are a lot of things wrong with the code below that we’ll get to in later episodes; the issue I’m going to focus on today is:

Mistake #5: Using a clock instead of a stopwatch.

using System;
using System.Collections.Generic;
class P
{
  public static void Main()
  {
    var list = new List<int>();
    const int size = 10000;
    var random = new Random();
    for(int i = 0; i < size; i += 1)
      list.Add(random.Next());
    var startTime = DateTime.Now;
    list.Sort();
    var stopTime = DateTime.Now;
    var elapsed = stopTime - startTime;
    Console.WriteLine(elapsed);
  }
}

That looks pretty reasonable, right? We create some random data to test. Let’s assume that ten thousand elements is about the typical problem size that the customer will see. We are careful to time only the actual sort and not include the creation of the random data. If we run it on my laptop I see that it reports about 3 milliseconds. How accurate is this?

That result is off by a factor of more than two from the correct answer; the sort actually takes about 1.2 millseconds. What explains this enormous discrepancy, and how did I know what the correct answer was?

Well, suppose you were trying to time this program by actually watching a grandfather clock that ticks once per second. This is of course ludicrous because you know what would happen: either the clock would tick during the few milliseconds the program was active, or it wouldn’t. If it did then you’d record one second, if not, then you’d record zero. The resolution of the clock is nowhere near good enough to actually measure the elapsed time with the necessary precision.

DateTime.Now is only slightly better than that grandfather clock: it only ticks every few milliseconds. Worse, the rate at which it ticks is not fixed from machine to machine. On modern Windows machines you can expect that the tick resolution will be around 100 ticks per second or better under normal circumstances, which is a resolution of 10 milliseconds or less. Apparently things went well for me on this laptop and I got a resolution of 3 milliseconds, which is great for DateTime.Now but still nowhere near fast enough to accurately measure the time spent sorting.

The correct tool to use when writing benchmarks is Stopwatch in the System.Diagnostics namespace. (I note that this namespace is well-named; everything in here is good for diagnosing problems but should probably not be used in “production” code.) Let’s take a look at the code written to use Stopwatch instead: (There is still plenty wrong with this benchmark, but at least we’re improving.)

using System;
using System.Collections.Generic;
using System.Diagnostics;
class P
{
  public static void Main()
  {
    var list = new List<int>();
    const int size = 10000;
    var random = new Random();
    for(int i = 0; i < size; i += 1)
      list.Add(random.Next());
    var stopwatch = new System.Diagnostics.Stopwatch();
    stopwatch.Start();
    list.Sort();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
  }
}

Notice first of all how much more pleasant it is to use the right tool for the job; the Stopwatch class is designed for exactly this problem so it has all the features you’d want: you can start, pause and stop the timer, computing the elapsed time does not require a subtraction, and so on.

Second, the results now show far more accuracy and precision: this run took 1.1826 milliseconds on my laptop. We have gone from ~10 millisecond precision to sub-microsecond precision!

How precise is the stopwatch, anyway? It depends on what kind of CPU you have; the Stopwatch class actually queries the CPU hardware to get such high precision. The Stopwatch.Frequency read-only field will tell you what the resolution is; on my laptop it is two million ticks per second, which you’ll note is considerably larger than the operating system’s weak promise of around 100 ticks per second for the grandfather clock that is DateTime.Now. This laptop is pretty low-end; better hardware can provide even higher resolution than that. Remember, one two-millionth of a second is still hundreds of processor cycles on modern hardware, so there’s room for improvement.

Suppose for the sake of argument though that the code we were benchmarking was going to run for seconds, minutes or even hours. In those cases the difference between a resolution of 100 ticks per second and 2 million ticks per second is irrelevant, right? If you’re going to be measuring minutes then you don’t need the second hand to be super accurate, so why not use DateTime.Now in those cases?

It’s still a bad idea because Stopwatch has been specifically designed to solve the problem of easily measuring time spent in code; DateTime.Now was designed to solve a completely different problem, namely, to tell you what time it is right now. When you use the wrong tool for the job, you sometimes can get strange results.

For example, suppose you use DateTime.Now to measure the time in a long-running performance benchmark that you run overnight on a machine in Seattle. Say, on Sunday November 3rd, 2013. The result you get is going to be wrong by one hour, and might in fact even be a negative number because DateTime.Now pays attention to Daylight Savings Time changes.

Just don’t even go there. DateTime.Now is the wrong tool for the job, was designed to solve a different problem, is harder to use than Stopwatch, and has thousands or millions of times less precision. Avoid it entirely when writing benchmarks in C#.

Next time in this series we’ll take a closer look at how the jitter can affect your benchmarks.

16 thoughts on “Benchmarking mistakes, part two”

(I was going to comment on the techpro article itself, but it looks like they’re still the Dark Ages of having to create an account specifically for their site just to leave a single comment.)

There looks to be an (unintentional) error in your Mistake 5 example: you have stopwatch.Stop(); while the context of the example suggests it should be var stopTime = DateTime.Now;.

Reply ↓

While I don’t disagree with the general theme, there is in fact DateTime.UtcNow which addresses your complaint about a DST change occurring during timing.

I would also disagree with the statement that one should use the System.Diagnostics namespace objects only for non-production code. That namespace includes things such as interacting with the event log and other logging mechanisms, retrieving information about file versions, and interacting with other processes on the system, all of which can be perfectly valid activities for production code.

Beware generalizations. They are always wrong. 🙂

Reply ↓

Jim Mischel on May 21, 2013 at 1:04 pm said:

Whereas DateTime.UtcNow will solve the problem for the specific case of DST changes, it doesn’t address NTP updates or the user changing the clock. My experience is that using the system clock (anything associated with DateTime, in this case) to measure elapsed time will get you into trouble.

Reply ↓

Regarding DateTime.Now: I’ve seen it be as accurate as 1ms. Not that I’m advocating its use for benchmarking – QueryPerformanceCounter and especially QueryThreadCycleTime are both much, much better – but still.

http://stackoverflow.com/questions/307582/how-frequent-is-datetime-now-updated

Reply ↓

jdhardy on May 21, 2013 at 1:01 pm said:

Sounds like some process (any process) called timeBeginPeriod(1).

Reply ↓

Unless I’ve missed something obvious, both DateTime and Stopwatch will show you the change in “real-world” time between the two statements. Because of thread/process scheduling is this really what you want?

I guess this depends on your definition of “execution time” (hopefully this isn’t the obvious thing I missed), but my understanding is that what most benchmarks want to know how long your task (and it alone) kept a processing unit busy. It seems easy at first because {amount_of_cpu_cycles} is directly related to {real_time_elapsed}.

But in a multi-threaded task this becomes more difficult to judge. Do you add up all the cores execution times (which is what Process.TotalProcessorTime does)? Probably not. Use just the longest-running thread that’s a part of the task? That seems better, but probably has it’s own problems.

An older (and likely now out-of-date) article that talks about some of this is here: http://kristofverbiest.blogspot.com/2008/10/beware-of-stopwatch.html

Sorry for my naive ramblings… I’m looking forward to the rest of this series 🙂

Reply ↓

A bit off-topic, but that article got me thinking about what is the best class to use when implementing an ID generation algorithm similar to what Instagram has done (http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram). I coded a similar algorithm in .NET using DateTime.UtcNow and since a server ID and sequence is used in addition to a timestamp in the ID, I suspect the actual precision of DateTime matters less, but still, would you advise using another class in that case ?

Reply ↓

Interestingly, when a high frequency timer is now available, StopWatch falls back to DateTime.UtcNow.

That means, if you somehow found a machine without a high frequency timer, you could change the result of your benchmark by changing your system date as the benchmark ran. The operating system might also change the result if it syncs the current time over NTP while your benchmark runs.

Reply ↓

I would disagree that System.Diagnostics is well-named and not for production use; Process is essential for production code and firing off new processes, and has got nothing to do with diagnostics. Some things like Debug asserts and DebuggableAttribute are useful to put in production code when you’re debugging an issue in it.

Reply ↓

The real reason to use DateTime.UtcNow in a benchmark (vs. DateTime.Now) is that it doesn’t have the thousands-of-cycles overhead of computing the timezone offset.

The other thing to keep in mind is that, while Stopwatch is likely to be far more precise, DateTime is likely to be more accurate. In other words, if your computer’s clock runs slow, the mechanism DateTime uses may kept accurate via NTP, while the Stopwatch mechanism won’t be.

Reply ↓

Jim Mischel on May 22, 2013 at 12:39 pm said:

NTP updates won’t make things more accurate unless the thing you’re timing takes a really long time. The default polling interval for NTP is usually quite long. Windows default, for example, is 7 days. Even if you use “constant synchronization,” updates aren’t going to occur more than once per minute, and usually much longer than that.

If NTP updates can change the clock, then there is some bit of uncertainty in your timings. Given two start/end timestamp pairs, you can’t compare them reliably because one pair might include adjustments to the clock and the other doesn’t. I can’t reliably compare two measurements obtained by taking the difference of DateTime.UtcNow values because NTP updates or other changes to the system clock might have occurred.

The advantage of using Stopwatch (the high frequency timer) is that even though it might be inaccurate, it’s consistent. I know that its report of 100 milliseconds, for example, represents the same time period every time. It might be only 99.987 milliseconds, but it’s *always* 99.987 milliseconds. I can reliably compare two Stopwatch values obtained from the same machine.

Reply ↓

“DateTime.Now pays attention to Daylight Savings Time changes.”
First of all, that seems absurd. Also, looking at the .Net DateTime documentation (http://msdn.microsoft.com/en-us/library/system.datetime.aspx) it appears to be false, given the existence of the DateTime.Ticks. It would not be feasible to support a Ticks property if a DateTime was inextricably tied to the current timezone and daylight savings state.

If “elapsed = stopTime.Ticks – startTime.Ticks” would give a consistent answer across DST boundaries, then surely it is the operation of subtracting times from one another which pays attention to DST changes, not DateTime.Now.

I would also argue that the behaviour is wrong, given that “A TimeSpan object represents a time interval (duration of time or elapsed time)”, and I cannot think of any logical argument that would support someone reasonably claiming that the duration of time or elapsed time across a DST boundary can be negative.

Reply ↓

Hello there! I know this is kinda off topic but I was wondering if you knew where I could locate a captcha plugin for my
comment form? I’m using the same blog platform as yours and I’m having difficulty
finding one? Thanks a lot!

Reply ↓

Pingback: Benchmarking mistakes, part one | Fabulous adventures in coding

Pingback: Benchmarking mistakes, part three | Fabulous adventures in coding

Pingback: Benchmarking mistakes, part four | Fabulous adventures in coding

Fabulous adventures in coding

Eric Lippert's blog

Benchmarking mistakes, part two

16 thoughts on “Benchmarking mistakes, part two”

Leave a comment Cancel reply

Share this:

Related

16 thoughts on “Benchmarking mistakes, part two”

Leave a comment Cancel reply