In the wake of the security disaster that is the Heartbleed vulnerability, a number of people have asked me if Coverity's static analyzer detects defects like this. It does not yet, but you'd better believe our security team is hard at work figuring out ways to detect and thereby prevent similar defects.
I'll post some links to some articles below, but they're a big jargonful, so I thought that a brief explanation of this jargon might be appropriate. The basic idea is as follows:
- Data which flows into a system being analyzed is said to come from a source.
- Certain data manipulations are identified as sinks. Think of them as special areas that data is potentially flowing towards.
- A source can taint data. Typically the taint means something like "this data came from an untrusted and potentially hostile agent". Think of a taint as painting a piece of data red, so that every other piece of data it touches also becomes red. The taint thus spreads through the system.
- A sink can require that data flowing into it be free of taint.
- If there is a reasonable code path on which tainted data flows into a sink, that's a potential defect.
So for example, a source might be user input to a web form. A taint might be "this data came from a client that we have no reason to trust". A sink might be code which builds a SQL string that will eventually be sent to a database. If there is a code path on which tainted data reaches the sink, then that's a potential SQL injection defect. Or a source might be a number sent over the internet from a client, and a sink might be code that indexes into an array. If an number from an untrustworthy client can become an index into an array, then the array might be indexed out of bounds. And so on; we have great flexibility in determining what sources and sinks are.
Now that you understand what we mean by sources, sinks and taints, you can make sense of:
- This blog article by Professor John Regehr
- The response from Coverity CTO and security expert Andy Chou
- Professor Regehr's follow-up posting
For the TLDR crowd, basically what Andy is saying here is: identifying sinks is not too hard1 but it can be tricky to determine when a source ought to be tainted. To get reasonable performance and a low false positive rate we need a heuristic that is both fast and accurate. The proposed heuristic is: if it looks like you're swapping bytes to change network endianness into local machine endianness then it is highly likely that the data comes from an untrusted network client. That of course is far from the whole story; once the taint is applied, we still need to have an analyzer that correctly deduces whether tainted data makes it to a sink that requires untainted data.
Taking a step farther back, I've got to say that this whole disaster should be a wakeup call: why is anyone still writing security-critical infrastructure in languages that lack memory safety at runtime? I'm fine with this infrastructure being written in C or C++, so long as at runtime the consequence of undefined behaviour is termination of the program rather than leaking passwords and private keys. A compiler and standard library are free to make undefined behaviour have whatever behaviour they like, so for security-critical infrastructure, let's have a C/C++ compiler and library that makes undefined behaviour into predictably crashing the process. Somehow C# and Java manage to do just that without an outrageous runtime performance cost, so a C/C++ compiler could do the same. With such a runtime in place, the Heartbleed defect would have been a denial of service attack that calls attention to itself, rather than silently leaking the most valuable private data to whomever asks for it, without so much as even a log file to audit.
To argue that we cannot afford the cost of building such a compiler and using it consistently on security-critical infrastructure is to argue that it would be cheaper to just deal with arbitrarily many more Heartbleeds.
Stay safe out there everyone.
- In the case of Heartbleed, a call to
memsetcould be the sink. ↩