Which is faster?

Which is faster, QueryLightBulbFrobStatusEx() or __WGetBulbFrobberState2()?

Hold it right there, buddy. Before answering that question I must give you my standard six-part rant about why I probably cannot sensibly answer questions that begin “which is faster“.

Part the first: Why are you even asking me?

Horserace 520133030If you have two horses and you want to know which of the two is the faster then race your horses. Don’t write short descriptions of the horses, post them on the Internet, and ask random strangers to guess which is faster! Even if by sheer chance you got an accurate answer, how would you have any confidence in its accuracy? You can easily and accurately discover which of two programs is faster by running both yourself and measuring them with a stopwatch.

(Caveat: writing performance benchmarks that provide high-quality, meaningful results can be somewhat tricky in a garbage-collected, JIT-compiled runtime. I see this done wrong a lot. I will return to this topic in a future blog post.)

Moreover: performance is highly sensitive to things like what hardware you are using, what software you have installed, what other processes are doing, and a host of other factors. You’re the only person who knows what environment the code is going to be run in, so you’re the only person who can do realistic performance testing in that environment.

Part the second: Do you really need to answer that question?

The question presupposes that there actually is a performance problem to be solved. If the code as it stands — the working, debugged, tested code — is already fast enough for your customer then knowing which of two ways to write the code is the faster is just trivia. Spend your valuable time worrying about something else, like testing, robustness, security, and so on. Unnecessary code changes are expensive and dangerous; don’t make performance-based changes unless you’ve identified a performance problem.

Part the third: Is that really the bottleneck?

Suppose that you really do have an identified performance problem: asking which of two things is faster is premature if neither thing is the cause of the problem! The most important thing that I’ve learned about performance analysis is that my highly educated and experienced guess about the root cause of a performance problem is dead wrong probably more than a third of the time. Use a profiler or other analysis tool to determine empirically where the bottleneck is before you start investigating alternatives.

Part the fourth: Is the difference relevant?

Suppose that you have actually identified a bottleneck: now the relevant question is not actually “which horse is faster?” Rather, the relevant question is actually “are either of these horses fast enough to meet my customer’s needs?” If neither horse is fast enough for your purposes then knowing which is faster is irrelevant. And if both are fast enough then you can base your decision on important factors other than performance, as discussed in part the second.

(The question also presupposes that the two alternatives proposed are semantic equivalents. If one of those is a horse and the other is a refrigerator, asking which one runs faster is maybe a non-starter.)

Part the fifth: What is this “faster” you speak of?

There are lots of kinds of speed, and optimizing for one kind can deoptimize for another. I’m sure you’ve all encountered situations where normal-case-scenario performance is acceptable but worst-case-scenario performance is terrible. Anyone who has been unable to get a long-distance phone call placed on Mother’s Day knows what I’m talking about; the telephone network was designed to handle slightly-higher-than-average load, not highest-likely load. Unfortunately, implementing code that ensures an upper bound on your worst-possible-scenario behaviour often ends up making the typical-case-scenario unacceptably slower, and vice versa.

Leaving the difference between best, worst and typical aside, there are all kinds of speed metrics. When I was working on Visual Studio Tools For Office we did comparatively little making the customization framework code run faster because our tests showed that it typically ran fast enough to satisfy customers. But we did an enormous amount of work making the framework code load faster, because our research showed that Office power users were highly irritated by noticable-by-humans delays when loading customized documents for the first time. Similarly, in the web services realm there is a big difference between optimizing for time-to-first-byte, time-to-last-byte and throughput. You have to know what kind of speed is really important to the customer.

Part the sixth: Are you looking at the big picture?

Almost all performance analysis questions I see are solely about improving speed. There are lots of non-speed metrics like memory usage, disk usage, network usage, processor usage, and so on, that might be more relevant than raw speed to your customer. Making code faster often involves trading less time for more memory, which might be a bad trade. Don’t over-focus on speed metrics; look at the big picture.

Well, that rant used up this whole episode. Next time on FAIC: Which is faster, Nullable<T>.Value or Nullable<T>.GetValueOrDefault()?

The photo credits are on Wikimedia Commons.

80 thoughts on “Which is faster?

  1. I might suggest an addendum question of “Are these things even comparable?” I had a guy once ask me which was faster: DSL or a Pentium 4. I didn’t know how to respond…

    • Eric mentioned that in Part the fourth:

      > If one of those is a horse and the other is a refrigerator, asking which one runs faster is maybe a non-starter.

      • Then again, maybe not. Assuming the same starting velocity (ignoring the means by which it is imparted) and Earth atmosphere, horse is probably faster because it’s more aerodynamic. A straightforward experiment would be dropping both the horse and the refrigerator from a sufficiently tall building, and timing the moment they hit the ground.

        • Except generally when you ask how fast a horse is, you want its running speed, not its falling speed. And, while people don’t normally talk about it as “speed”, the obvious meaningful ‘speed’ is how fast it can chill a given amount of air or other material to a given temperature. This quantity is normally thought of as a power quantity, since that’s what dimensional analysis gives you if you factor out the need to specify particular materials and temperatures.

        • A falling horse will fall at the same speed as of a falling refigerator, since they are both submited to the same gravity. But the heavier will hit the ground with more force then the other. But in the end, it will not matter because you will loose both in devastating terms.

          • Every object that falls on the ground, no matter from how high, will fall at an average of 10m/sec^2, depending on where you are on the earth. And considering the 2 objects we are talking about, the athmosphere influence will be unsignificant.

  2. Well, that rant used up this whole episode. Next time on FAIC: Which is faster, Nullable.Value or Nullable.GetValueOrDefault()?

    Haha, brilliant. 🙂 That ending remark sums up exactly one big point you left out. It’s still interesting to ask about such things just to increase your general understanding. Curiosity is a powerful motivator.

    For minor optimizations I tend to prefer readability over performance, following the mantra “Write code for humans, not machines”. Where possible optimization should be handled by the compiler anyhow.

      • I’d guess `GetValueOrDefault()` simply returns a field, `Value` needs a `if(HasValue)` check. While branch prediction will obviously work well, that’s certainly a bit more expensive.

        • Winner winner chicken dinner! That’s precisely what the implementation does. So when I say “it depends”, I mean it’s comparing apples and oranges. If your requirements are such that you need an exception/conditional if there’s no value, then only `Value` will do. If your requirements are set to deal with `default(TValue)`, then `GetValueOrDefault()` is your huckleberry. And yes, the latter will be “faster” because of its complete lack of branching logic.

          • If `GetValueOrDefault` simply returns the value field, does that imply that a `Nullable` may simultaneously be `null` and yet return an arbitrary value from `GetValueOrDefault` (even if properly written code would never generate such a value, such values could be passed in from outside code)?

          • I expect that copying/pasting could would produce a mess, but the behavior can be demonstrated by having one thread read a Nullable(of somethingBig) will another thread is setting it to null. The fact that copying a struct in one thread while another thread is mutating may yield a struct with a mix of old and new field values it is hardly surprising; the slightly “surprising” aspect is that GetValueOrDefault doesn’t ignore the backing field for Value if the backing field for HasValue is false. Of course, if one grabs a Nullable(of T) at an inopportune time, one should expect that the field values may be garbage even if `HasValue` is true; that doesn’t mean that one should expect that a “null” instance won’t report a default value for GetValueOrDefault().

  3. Eric, I value the fact that you think through the motive before trying to address the problem. I am working very hard at learning to do the same, not only in my career, but in my life in general. Reading things like this enhance my life as a whole.

    Thanks Eric!

  4. In many cases, someone who asks “Which is faster–XX or YY” isn’t really interested in knowing, “On my own system, under some nominal set of conditions, which is faster–XX or YY”, but rather “Are there likely to be any systems or conditions where XX is apt to be significantly faster than YY, or vice versa”? If, for example, it turns out that method XX runs 5% faster than YY on Intel processors, but 50 times slower than YY on AMD processors, and if the overall time taken by XX is significant in the latter situation but not the former, someone who is writing a program for the general market should probably use YY. I’m not sure how someone should expect to determine that, though, other than by change, or other than by asking whether anyone knows of any situations where the performance of XX and YY are drastically different.

    • I think what Eric is trying to say is that the “what if” scenario is so difficult to pin down that the conversation becomes academic and much less practical and the solutions generally have holes in them because they haven’t been implemented.

      Performance is affected by so many variables, and generally a significant number of them out of your control, that you need to know “what is wrong” and “what is my motive for optimizing” before even concerning yourself with it.

      • It’s true that one can’t take all possible scenarios into account, but there are a number of factors which will often affect relative performance of two different methods. For example, in some programs Gen0 collections will tend to be relatively cheap, while in others they will tend to be relatively expensive. If a program’s heap usage patterns are such that Gen0 collections will be expensive but rare, methods which are fast but generate and abandon multiple heap objects may perform much worse than those which take longer but don’t generate any garbage. If a benchmark loop doesn’t touch any complex objects on the heap, any gen0 collections after the first are going to be fast, but that won’t necessarily be the case when code is running in the real world.

  5. What leads me to start thinking about performance is usually not the question “which is faster?”. It is “why the hell is this thing so slow?”. And what I find after profiling is something I had no idea was there.

  6. I generally agree with all your “part the X’s”, but I do find myself contemplating “Big O” types of speed questions every now and then, particularly when dealing with LINQ. Things like, should I use ToArray() on my query or just leave it as an enumerable? If you get in the habit of giving a little thought to these things, and maybe doing some quick and dirty stopwatch tests when you’re not sure, it seems like you would save yourself a good number of future bottlenecks, particularly if your app scales beyond your initial expectations.

  7. Thanks as always for the great tech writings, and I hope the new work adventure goes great! Totally unrelated to this post, the new blog platform is somewhat less friendly to RSS readers, or at least the Newsify reader on iOS. The RSS content only contains the first few sentences and a link to the main site, so we lose RSS reader capabilities like offline reading, dyslexia-friendly fonts, and other such things. Is there a WordPress setting that might change the feed to include the full article text? Worst case I can use Instapaper, but if it was easy to change it seems worth it.

    • The setting would be in the wordpress control panel, Settings->Reading, then change the radio button for “For each article in a feed, show ” from “summary” to “full text”.
      Assuming of course Eric want’s to appease you at all 😛

      related to the Post itself, I had somebody ask me what the fastest sort algorithm was. I was going to say “Quick sort, usually”, but then stopped myself and said that the quickest way to sort is to not have to sort at all.

      • Ok, let’s assume the bugs do affect you, but that they can be worked around, albeit inconveniently.

        Now which is better?

        • I don’t really mean to be pedantic but it kind of depends on how you mean “better.” If the workaround still take half the time of the bugfree version I might prefer the faster one.

          Obviously if they slow the process down to the point where the faster program is no longer faster, there’s no point in using it unless there is some mitigating advantage.

        • If the program is a part of a much bigger process, and that process has a lot of constraints outside the world of computers, then I would choose the program that has a more predictable running time and a more predictable quality of results. At the very least, I’ll know for sure whether or not the program is good enough for a specific project, and I’ll know what must be done in order to compensate for the program’s deficiencies.

    • I forget where I heard this; some details may be inaccurate: One day, a group of students was given a task: Find the 10 most frequent words within an input document. They were then told to optimize for speed. Programmers used various clever solutions (e.g., hashing). The fastest program in the class ignored all but the first 100 words, but still passed all of the test cases.

  8. And, don’t always assume faster is better….

    We worked really hard on the performance of an insurance search web site and were super pleased and very proud when it presented the results within milliseconds of clicking the search button.

    Our client made us put in an artificial delay and a spinning progress indicator so that the “customer would really think the site was working hard for him!”

  9. In part 5 you mention deoptimizing other areas and in part 6 you mention looking at the big picture. What’s even more insidious (and harder for a novice to spot) is the combination of these two: when the area you are deoptimizing is somewhere else in the application. One common mistake I’ve seen is where someone saves a calculated result in a private member variable to ‘save having to recalculate it’. Net result the whole system has less memory for the cache and runs slower even though their code appears to run faster.

  10. Then again, sometimes it’s pretty clear-cut which is faster. That happens sometimes on, say, stackoverflow, and then most of the time, the poor sod who dared ask a question about performance gets utterly lambasted.
    You’re certainly not wrong, but “optimization is hard and should be done carefully and analytically” has somehow morphed into “optimization is bad and you should feel bad” for a lot of people. They’ll throw around Knuth’s famous quote about premature optimization (often implicitly assuming that all optimization is premature), and forget that the article in which Knuth wrote that was actually about the importance of optimization, not the unimportance.

    • Right? And often, the same guy who would rattle the poor sod’s cage about asking such a question will be the first to chime in post-facto about what a poor choice a poster made of the options available. Can be a no win for people still learning the ropes.

  11. I think that often what someone means when they ask ” which is faster?” is actually “which is more efficient?”

    More specifically, often there are multiple reasonable-sounding (by name) choices within a language or framework. Prime example might be the .net string vs. stringbuilder class. While string will tend to feel a vinous to a newcomer for performing concatenations and such, the reality is that the stringbuilder class was purpose-built to overcome limitations related to multiple concatenations using plain old string. Fortunately, this specific case is well-documented.

    Not-so clear, in both java and c#, are the differences between various array and list-type structures, some of which are more efficient for specific use-cases than others. Further, the names of these are not always helpful in figuring out which one to use for a given situation.

    I think most of the time people realize you can’t possibly know the profiling conditions for their system when they pose this type of question. Instead they are seeking to understand if there is something they should know about the implementation options they have. This is especially true among newcomers (I am one of those, although I try to do my research first!).

  12. I guess it depends upon the world you live in. I’m an embedded systems engineer so which is faster or which is smaller turns out to be important a lot of the time. Due to processor (cheap as possible to do the job) and memory constraints, I’m always asking these kinds of questions. Things like, should I use a sin table to generate tones or should I use a 2nd order digital oscillator? I’m almost always fighting the low memory, low performance processor battle. I don’t do premature optimization but on most of the projects I’ll usually end up have to either optimize for space or speed. This is invariable due to feature creep after the 1.0 release.

    • The fight between space and time is “classic”. When working on software for space craft and ground data systems, one could find oneself writing “spaghetti” code or equally obscure code on purpose because some compiled (assembled) instruction’s numeric value is the constant you need for some arithmetic operation, or you can’t quite get the 5 needed instructions in because there is only room for 3 or 4.

  13. Interesting article. I’d think the best approach (and the one I usually take) would be to make everything as fast as it can be: any remaining bottlenecks, then, will be in the infrastructure, not the code.

    Erratum: in Part The Fourth, the relevant question is actually, “Is either of these horses fast enough to meet my customer’s needs?”

    • I guarantee you that you do not make anything as fast as it can be, and you certainly do not make everything as fast as it possibly can be. Rather, since you are a sensible person with finite resources, you make things faster until making them faster still would cost more than it is worth the effort to do so.

      Making code faster costs effort, and there is only a finite amount of effort available in the world. Effort costs time and money.

  14. Great and often ignored advice. I have seen architects pushing redesigns into teams based on flawed measurements which did have no effect on an real installed machine.

    You should mention which profilers you can use. The Windows Performance Toolkit is a great piece of work (http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx). The only strange this is that the call stacks seem to be broken with .NET 4.0 (http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/b7098a78-6600-456a-813e-79a84b994f32).

  15. Pingback: Liens de la semaine #9 « frenchcoding

  16. Pingback: How to time code « Jim's Random Notes

  17. Pingback: How to Create Early Bound Entity Classes with the Code Generation Tool? | Zsolt Zombik's Professional Blog

  18. Pingback: How to Create Early Bound Entity Classes with the Code Generation Tool? - Microsoft Dynamics CRM Community

  19. Pingback: How to: Ternary operator is twice as slow as an if-else block? | SevenNet

  20. Pingback: Fixed Ternary operator is twice as slow as an if-else block? #dev #it #asnwer | Good Answer

  21. Pingback: Solution: Ternary operator is twice as slow as an if-else block? #answer #programming #dev | IT Info

  22. Great read and many, many valid points. Thanks

    Re point 6 though and the relative importance of the various performance measurements, I think your “bigger picture” is too small.

    Its worth pointing out that as far as the end-user who’s using your (or your customer’s) system, raw “speed” is invariably the *only* thing that matters to end-users. They don’t care one bit about your carefully balanced server load vs cost so I think you also need to expand your big picture to include the end-users needs – that may well bump up speed (with any potential costs) in your priorities.

  23. Pingback: » Python:memoization library for python 2.7

  24. Pingback: Which is faster? | Tim's Dev Blog

  25. Pingback: How I Found CUDA, Or: Rewriting the Tag Engine – Teknoids News

  26. Pingback: How I found CUDA, or: Rewriting the Tag Engine–part 2 | 神刀安全网

  27. Pingback: SUT Double | 神刀安全网

  28. Pingback: SUT Double | Tech News

  29. Pingback: Nullable micro-optimizations, part one | Fabulous adventures in coding

  30. Pingback: 三元运算符的速度是其他块的两倍吗? – CodingBlog

  31. Pingback: System.Math.Round – Round to zero digits and divide vs. Round to one or more digits – program faq

  32. Pingback: Microbenchmarking and the Importance of Being Thorough – The Developing Coder

  33. Pingback: Avoiding async/await if I know most of the time the result will be cached2019 Community Moderator ElectionUsing async/await for multiple tasksHow and when to use ‘async’ and ‘await’Do you have to put Task.Run in a method to make it async?.NET asyn

  34. Pingback: How Bad Is Good Enough? | Fabulous adventures in coding

  35. Pingback: boolean - La forma más rápida de generar una aleatorias booleanas

  36. Pingback: boolean - Modo più veloce per generare casualmente un booleano

  37. Pingback: [C#] Ternary operator is twice as slow as an if-else block? - Pixorix

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s