Here’s another fragment of the video interview I posted recently.
In this fragment I discuss the difference between analying IL and analyzing source code when trying to figure out if a defect exists. It turns out that you can get much better results by looking at the source code; there’s simply more information there. Being able to look at things like indentation and comments, where local variables were declared, and so on, makes a subtle but very real difference in the quality of the analysis.
This was an interesting video shoot for me. When I was at Microsoft I was used to Charles coming by with a hand-held camera and chatting with me off-camera, and then we’d pretty much post the whole thing. This was a much more involved production with lights, several camerapeople, multiple takes, and so on.[1. Also I find it amusing that they’ve changed the color saturation and tonality quite heavily; I’m not actually that pink, and the wall behind me is not green!]
Is “intendation” a typo for “indentation”? If so, it’s a particularly interesting one as presumably you’re trying to detect what the developer *intended* from the spacing…
Well, if indentation means code is indented, intendation would therefore mean code is “intended”, presumably in contrast to “implemented”. So I suppose analyzing code intendation would involve scanning for comments like “TODO – implement this later”.
I get to see a lot of real-world code with defects in it; a surprising amount of it has a comment that says “this code is wrong, don’t ship like this”.
By way of contrast: we have a checker that looks for unintentional switch fall-through in C/C++. (Not a problem in C# as fall-through is illegal.) One of the heuristics is: if the switch section ends with a comment then the defect is suppressed! A survey of real-world code shows that there is an extremely high probability that the comment says some variation of “I intend fall-through at this point”.
Ha ha!
So you’re saying that I could use grep rather than buying your tool?
🙂
Hah, yes, that is one of those typos where you are thinking one word and that affects the mistyping of another.
Why do you need to analyze the library to see if the returned object is disposable? Isn’t that obvious from the type of the returned object?
Console.Out is a function that returns an IDisposable, but it’s hard to contrive circumstances under which you’d ever want to call Dispose on it. What if you had a fluid interface where the IDisposable that’s passed in is also returned?
You need to look at the IL to see if the library method actually creates an object and returns it (and hence needs to be disposed) vs. just returning an existing object (which shoule not be disposed).
Gabe is exactly right. There are a LOT of objects that implement IDisposable because their base class does, but failing to dispose them is not a user-affecting bug. Telling you “this thing implements IDisposable, so you should dispose it!” is what we call a “style checker”, and that’s the sort of thing best left to FXCOP. We’re looking for the short list of bugs that you say OMG I am glad we found that before we shipped.
Thus the IL analyzer looks at more than just “does this thing implement IDisposable?” It also digs into the implementation of IDisposable to see if it does anything interesting. If it can’t find some evidence that there’s really an effect there then the object is not treated as a resource.
Would it be possible for the language groups to agree on attributes which would identify whether particular method or constructor parameters accept responsibility for IDisposable objects, and whether particular methods and constructors hand off responsibility to their caller? Having attributes specify such things would seem better than forcing code analyzers to attempt inferences.
That’s not a bad idea per se. However in practice it has some problems.
Two problems are (1) what incentivizes library developers to tag their methods properly? This sounds like a tax, and developers hate taxes. Microsoft library developers of course have no problem paying this tax, but this is a bit of a “boil the ocean” problem; the benefit is not incurred unless a majority of library developers correctly annotate their code.
And (2) that “correctly” is notable. One of my language designer colleagues used to say that every time you give a developer a choice about how they write a bit of code you’re giving them the opportunity to do it wrong. If the compiler cannot verify that an annotation is correct then you’re giving the developer the opportunity to write a bug. If you can verify that it is correct then you don’t need the annotation.
Neither is an unsolvable problem. But any time we start kicking around the idea “well let’s just add semantic annotations to the code”, that’s an indication that some language feature is missing.
More generally what we’re talking about here is *workflow*. That the contract is you open the socket, you send and receive on the socket, you close the socket. Violating that contract — sending on a closed socket, failing to close a socket that was opened, and so on — is an error that is not caught by the compiler because generalized workflow is not a basic concept in the language. Only specific workflows are baked into the language. The language *forces* you to call a ctor before you use an object and *forces* you to call a dtor when the finalizer thread runs. Instead of adding tags that are the inputs to static analyzers, maybe workflows ought to be a first-class language concept.
Defining new attributes isn’t going to magically cause existing methods to be tagged with them, even when the attributes would be applicable. It would probably be good to have a means via which code that imports methods from outside libraries could “pretend” that those methods were tagged with certain attributes. Given such an ability, even if consumers of code would sometimes have to explicitly specify attributes that outside methods should have been tagged with but weren’t, being able to specify once the attributes that should apply to a widely-used outside method would be better than having to assist code-validation tools at every call site.
Also, consider the following hypothetical code snippets:
float f;
double d;
// Do some computations
if (f.Equals(d)) doSomething();
List myList;
// Initialize the list
myList[3].Offset(3,4);
Both of them are legal, but clearly wrong. Every type has an overload of `Equals` that takes type `Object`, and parameters of type `Object` can generally accept boxing conversions from value types, but non-broken code is very unlikely to use a boxing conversion when calling `Equals` on any type other than the one being converted (code which uses one generic type’s `Equals` method on another generic type that might require boxing might work correctly, but the meaning of such code would be clearer with an explicit cast to `Object`). As for the second example, whether or not Offset should have been designed to mutate the underlying struct, its behavior is pretty well set in stone and cannot be changed without the breaking existing code that uses it correctly. On the other hand, there’s no reason a compiler should have to allow code like the above if the author of the Offset method knows that such usage is certain to be erroneous.
Is the new Coverity checker using Roslyn for any of its analysis?
The release version of Roslyn hasn’t shipped yet. Version 7 of Coverity’s analyzer has shipped to customers.
From these facts you can deduce the answer to your question.
Thanks Eric for this interesting interview fragment. Are you going to publish more pieces? I’m looking forward for the next one.
Roslyn provides an API to work with a compiler. Doesn’t it? But static code analysis tries to work with the source code. Using ASTs is the best way to work with source code? Isn’t there a better and closer way?
Hello Eric,
You recently answered my post on Stack Overflow, regarding roguelikes. I’ve been looking at your shadowcasting tutorials and am wondering the following: Do I’ve to create a visual C# console application and go from there with your code samples or..? How many classes do I’ve to create, or is the code done all in one file..? I don’t think that it’s explained anywhere. I’m eagerly waiting for your answer! Sorry if I’m annoying.