Analysis vs code review

Thanks to everyone who came out to my “webinar” talk today; we had an excellent turnout. Apologies for the problems with the slides; there is some performance issue in the system where it works fine when it is not under load, but when there are lots of people using it, the slides do not advance as fast as they should. Hopefully the hosting service will get it sorted out.

As I mentioned last time, the recording will be edited and posted on the Coverity blog; I’ll post a link when I have one.

We got far, far more questions from users than we could possibly answer in the few minutes we had left at the end, and far too many to fit into one reasonably-sized blog post, so I’m going to split them up over the next few episodes. Today:

What percentage of defects does the Coverity analyzer find that should have been caught by code review?

We don’t keep formal statistics to answer that question. Instead of trying to guess at a percentage, let me describe a bit about what we do internally in this area.

We have hundreds of millions of lines of open-source code in C, C++, Java and C# stored locally which we have run our analyzer over, producing a massive database of defects. Every week all the developers here take a few minutes to review a few defects that no one has looked at yet and essentially do a code review — of code that we didn’t write and do not understand well! — to see if the defect reported is a “true positive” or “false positive”. This is static analysis jargon: “positive” because a defect was reported, and then “true” or “false” depending on whether the defect reported corresponds to a real bug, or was a mistake to report.

Along the way we typically browse around nearby code to see if there are any other defective-looking patterns that were not spotted. These are “false negatives”, because they did not produce a defect when they should have. We then consider ways to improve the analysis to find the false negative.

By doing this every week for many years we get a good notion of what sorts of bugs are easy or hard to find by code review; a great many defects are quite straightforward and easy to see once you know they’re there. Some of them are really quite difficult to see and require searching through many different files to understand the exact semantics of the various methods called by a particular bit of code.

Let me give you an example of a defect that is trivial to verify once you know it is there, but difficult to see if you don’t. Probably the simplest checker we have searches for this pattern:

try 
{ 
  Foo(); 
} 
catch(Exception ex) 
{ 
  new WrapperException(ex); 
}

You, reading this, review the code right now. Did you spot the defect? Pretty easy. But would you have spotted it if the try block had been a hundred lines long and I hadn’t told you there was a defect somewhere on the page?

On the one hand, you can always say that a “good” code review process should spot a trivial defect like that. On the other hand, we see a few instances of this per million lines of code analyzed because it is so very easy to do by accident, and so very difficult to notice; code reviewers often do not look very carefully at the error case.

Interestingly enough, we see about an order of magnitude more reports of this defect in Java than in C#, and I’m not sure why that is. Anyone who is a Java expert who would care to advance a hypothesis, I’d be interested to hear it; leave a comment below.

This was the only checker that I wrote myself, and like I said, it is extremely straightforward; it is entirely intraprocedural and does not need to analyze control flow within the method. I mentioned before that when reviewing a defect in code, we look near it to see if there are any other defects; a developer who made one mistake might have made another nearby. That’s what inspired this checker. I was reviewing defects in a customer’s code, and the method containing the defect began

if (parameter == null ) 
  new ArgumentNullException("parameter");

This version of the defect is likely harmless; the worst that can happen is that a null dereference exception is thrown instead of a null argument exception. But the “try-catch” version of it that we saw above is actually very nasty indeed; we have seen real-world code where the exception thrown means “you must abort this method for security reasons”, and of course the try-catch version of it suppresses the exception, allowing the code that is not supposed to run to now execute, weakening the security of the code.

I have digressed somewhat; let me get back to talking about code reviews.

Unfortunately, something that we see a lot when we do defect reviews of customer’s code with the developer present is denial that there’s a problem at all. For example, we quite frequently get developers who tell us that sure, they agree that their C program is dereferencing a pointer after the memory has been free()d, but they claim that the contents of the memory are guaranteed to be still good on the next statement, so there’s no defect that would impact the user of the code. One wonders when they believe that the memory “goes bad” — like it’s milk or something. Some developers actually refuse to fix the program because “it’s not wrong” to use memory after it’s freed. If the developers do not actually know what is dangerous behaviour then they cannot possibly have an effective code review regimen, but developers like that are also unlikely to make effective use of a static analysis tool. The best we can do is to find the defects and try to present the defect in a way that clearly encourages the developer to fix it correctly.

About these ads

28 thoughts on “Analysis vs code review

  1. I’d say the “why do we see this more in Java vs C#” question is pretty straightforward based on a single language feature:

    Java has checked exceptions. In C# it’s always completely optional (at a language level at least) to catch and wrap exceptions thrown by code you call. In Java it’s compulsory to either declare that you throw everything that your called methods might throw[1], or include a catch-and-wrap block to wrap everything into an exception you DO declare that you throw. I’m not sure what’s considered best practice in C# (which says something in itself, since I’ve been programming C# as the primary language I use in my day job since 2002) but even if catch-and-wrap is considered best practice, an awful lot of code is written by people either unaware of such best practices or who have different opinions about what’s best practice.

    It’s pretty easy to see why a language that forces you to write catch-and-wrap clauses would have more bugs in catch-and-wrap clauses than a language that lets you leave them out.

    [1] Yes, I know about RuntimeException and direct descendents of Throwable but the general point stands

    • I had hypothesized that it was due to checked exceptions, but was not clear on what defective behaviours the language feature encourages in the developer, not being a Java developer myself. This is a plausible explanation; thanks!

      • Might even be a _testable_ explanation: given your tools and the library of code that you run them over regularly, it ought to be possible to identify catch-and-wrap blocks and see how many there are (a plausible definition would be “any catch block that always ends in a throw of an exception whose constructor is called within the catch block”).

        If my explanation is correct, you’d find that there are far more catch-and-wrap blocks in Java code than there are in C# code – and that the ratio approximately mirrors the ratio you see in the number of occurrences of this particular defect.

    • I agree. It’s a lot more common to see exception-wrapping catch clauses in Java. I expect that’s why there would be a corresponding increase in the frequency of problems in such clauses. No other reason springs to mind.

      I expect it’s the checked exceptions, too.

      I think exception-wrapping ought to be best practise in C#, but rarely see it in the code I encounter. This impacts code quality and makes maintenance harder – code just breaks from time to time and it’s hard to tell why. Rant rant etc.

      • I think it depends on what the code is used for and how big its userbase is. If you have a self-contained web application with a smallish userbase who have direct-ish access to the developers, leaving exceptions uncaught and unwrapped gives the developer the most directly useful stack trace. And also avoids the possibility that a poorly written catch-and-wrap will discard the necessary diagnostic information.

        I may be biased, though: my current project involves investigating a large codebase in which every exception is caught and mis-wrapped discarding all stack traces from the original exception until the end user ends up with no information except “Object reference not set to an instance of an object” and no clue as to which of a thousand methods the object in question might be in. Give me a codebase where exceptions propagate freely uncaught ANY day.

        • I certainly agree that exceptions should not be wrapped by every method call in a call stack, or even a majority.

          In a standard three-tier web app, I’d expect the exception to be wrapped about twice, usually. The bottom one might be the original SqlException or what have you; the first wrapper might be something specific to the business functionality that failed; the top one a UI-specific one intended to drive the user error feedback.

          If your project is not big enough to be split into layers like that, then I’d agree that wrapping is less of a priority.

          The exception handling in the large codebase sounds horrible. It is vital that error information is not lost. Java and C# both have the notion of an internal exception that is well-supported when logging the stack trace. If this codebase isn’t using it, or messing it up somehow, that really sucks.

          I’d rather have just about anything than that. The C++ codebase I’m sometimes working with now is mostly better than that, and its error handling is based around hresults and macros.

          • A problem with having exceptions bubble up is that there is no mechanism to distinguish “Method `foo` is throwing an `InvalidOperationException` for the reason described in `foo`’s documentation”, from “Method `foo` called some method which–for some reason unknown to `foo`, threw an `InvalidOperationException`.” What’s really needed is not a huge hierarchy of exceptions, but rather three: (1) method failed for documented reasons, and the object is presumably in the state indicated by that; (2) method failed for some unknown reason, but the system state is otherwise normal; (3) system state is compromised. Unexpected exceptions of the first type should be wrapped as the second, while those of the other types should percolate up as their own types. Unfortunately, since something like `InvalidOperationException` may represent any of the above conditions, there’s no general way a method can tell what should really be done with it.

    • What probably also doesn’t help is that because you need to do this so often in Java, developers pay less attention, both when wrapping the exception and reviewing it. I think it’s not just that it happens more often because there are many instances of it, but also because developers and reviewers get lazy.

  2. One of possible reasons of such disparity between C# and Java could be the existence of Resharper (oralternatives), which catches this stuff immediately.

    In Java world, this kind of tooling is used much less frequently – at least from my observations.

    • Resharper is basically unnecessary because the widely-used IDEs (Eclipse and IntelliJ – since those are the ones I use, they must be the widely used ones :) have those checks themselves.

      Of course, I’m sure there are developers who turn them off or leave them off.

  3. I don’t do much Java, (or C# either), but what I tend to do in Java is wrap all checked exceptions in an unchecked exception. This keeps the throws list clean while not swallowing the error. It would of course encourage exactly this bug.

    Honestly, this one is so simple it should be made into a compiler warning. I’m not sure about the precise condition to check for, but something like “an instance of a class derived from Exception is constructed but not used”.

    • Alternatively, a new language could require a new keyword to explicitly swallow an exception (perhaps call it “keep” to stay with the try-catch metaphor). If you catch an exception, you need to either explicitly rethrow it or keep it.

      try
      {
      /* … */
      }
      catch(SocketException ex)
      {
      LOG.Error(ex);
      keep;
      }

      • The C++ exception-handling model was in many ways superior to what came before, but it encapsulates way too much information into the *type* of the object which is thrown. Since C++ wanted to avoid having any type hard-baked into the compiler, there is no base exception type, and as a consequence there was no information available to a compiler about what should be caught other than the type of the exception, and no mechanism other than inheritance for allowing a single `catch` to handle multiple conditions. Java and .NET improve upon things by having a base exception type, but still rely upon inheritance to decide what should be caught, and assume that “catching” an exception and “resolving” it will generally be synonymous.

        One thing that mgiht have been most helpful would have been for exception objects to include a virtual “IsResolved” method; rather than having the system prevent critical errors from being caught, have it allow code to catch those errors but only have those exceptions clear their “IsResolved” property if the code indicates its awareness of the particular critical exception that it caught (e.g. by calling an `AcknowledgeCpuOnFireException` method which is unique to `CpuOnFireException`.) Under such a model, there would be many more places where an execution layer could safely `catch (Exception ex)`, since that would only resolve exceptions whose meaning was “Operation failed without side-effects”, but not swallow exceptions which indicated more serious problems.

    • Personally, I wouldn’t mind if it were a compiler warning if ANY object is created and unused: much like “a + b” is a valid expression but “a + b;” is not a valid statement, I’d be okay with “new Foo();” not being a valid statement. In the rare cases where that IS what’s required (presumably because the code is relying on a side effect from the constructor) it’d be easy enough to have a public static void Unused(object o) {} and replace the “new Foo();” with “Unused(new Foo());” which would then be legal.

      • I’ve found that C# does a fairly good job of that, at least compared to other languages I have used.

        I’m sure there are further improvements possible. I’d want to keep

        new Foo().DoSomething();

        though; I use that from time to time.

  4. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1652

  5. Your last paragraph startled me — how can a C programmer argue that using already freed memory is anything but potentially dangerous? Oh well, he who blows in the fire will get sparks in his eyes.

    • They probably believe that it’s safe to use already freed memory up until the next call to malloc, and back this up by demonstrating a test program that uses memory after freeing and which happens to work.

    • Some people will say pretty much literally anything to avoid admitting fault. You’d think programmers in particular would get that trait knocked out of them by computers, but not always so.

      And some C programmers just don’t really distinguish between contracted behaviour and implementation-specific behaviour. To them, a test program really would override what the documentation says. It’s great of you enjoy arguments between developers about whose fault a bug is – the documentation is not an agreed arbiter of what a function call should do, so the caller wanting it to do something different is on an equal footing with the implementer who wants the caller to do something different.

    • In the case of `malloc`/`free` the usage is clearly and unambiguously 100% wrong. On the other hand, it’s not uncommon for functions to return information in a fashion which is only guaranteed to be valid until the next function call. For example, many libraries have historically had a function to return information about the last error. Further, it would not be difficult for a library to implement `malloc()` and `free()` in such a fashion that either operation would check whether a `free` was pending (freeing up the indicated memory if so), and `free` wouldn’t immediately release memory but instead merely establish the request as pending. If such a design also included a `__deferfree()` method which would return the specified pointer if it was pending (cancelling the operation) or null if it wasn’t [thus allowing code to unconditionally call `free` on the return value of `__deferfree`], such a design would make it easier to have methods which might return either a static or temporary object without requiring that the object itself include any indication of whether it was static or temporary. Provided the semantics were followed, it wouldn’t work for nested objects, but would work for nested methods.

      Nothing in the C standard would prohibit a library implementation from behaving as above, but I am unaware of any implementations that actually do so. A library designer who wanted to make such semantics available would be better advised to implement a `__schedulefree` method which would behave like the `free` method above, and have `free` itself release the directly-requested block in addition to any that were pending. That would ensure that any code which wanted to make a temporary object available to its caller would fail at link time if such semantics were unavailable (since libraries without such semantics wouldn’t have a `__schedulefree()` function).

  6. Pingback: Dew Drop – July 16, 2014 (#1815) | Morning Dew

    • Correct. The code is now swallowing an exception that the programmer expects it to not swallow, which means that it will both fail to report the error on up the line, as well as to execute code that shouldn’t be executing because the code is in an exceptional case.

  7. Reminds me of the (not very PC) novelty song “My Boomerang Won’t Come Back” where the subject of the song couldn’t get the boomerang to come back no matter what he did.

    And old man finally told him the trick. “If you want your boomerang to come back, well first you’ve got to *throw* it.”

  8. A long, long time ago, there was a version of UNIX when it was documented that you could use memory after a call to free, until the next call to free/maloc/realoc etc. This was in the days before C had type checking on arguments to functions.

    One of the common problems when porting code between different systems that claimed to be UNIX was fixing these.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s