About Eric Lippert

Eric Lippert is a developer on the C# analysis team at Coverity. During his sixteen years at Microsoft he worked on Visual Basic, VBScript, JScript and C#, and was a member of the C# language design team. He can be found on Twitter at @ericlippert and blogs about C# at http://ericlippert.com.

Heartbleed and static analysis

In the wake of the security disaster that is the Heartbleed vulnerability, a number of people have asked me if Coverity's static analyzer detects defects like this. It does not yet, but you'd better believe our security team is hard at work figuring out ways to detect and thereby prevent similar defects.

I'll post some links to some articles below, but they're a big jargonful, so I thought that a brief explanation of this jargon might be appropriate. The basic idea is as follows:

  • Data which flows into a system being analyzed is said to come from a source.
  • Certain data manipulations are identified as sinks. Think of them as special areas that data is potentially flowing towards.
  • A source can taint data. Typically the taint means something like "this data came from an untrusted and potentially hostile agent". Think of a taint as painting a piece of data red, so that every other piece of data it touches also becomes red. The taint thus spreads through the system.
  • A sink can require that data flowing into it be free of taint.
  • If there is a reasonable code path on which tainted data flows into a sink, that's a potential defect.

So for example, a source might be user input to a web form. A taint might be "this data came from a client that we have no reason to trust". A sink might be code which builds a SQL string that will eventually be sent to a database. If there is a code path on which tainted data reaches the sink, then that's a potential SQL injection defect. Or a source might be a number sent over the internet from a client, and a sink might be code that indexes into an array. If an number from an untrustworthy client can become an index into an array, then the array might be indexed out of bounds. And so on; we have great flexibility in determining what sources and sinks are.

Now that you understand what we mean by sources, sinks and taints, you can make sense of:

For the TLDR crowd, basically what Andy is saying here is: identifying sinks is not too hard1 but it can be tricky to determine when a source ought to be tainted. To get reasonable performance and a low false positive rate we need a heuristic that is both fast and accurate. The proposed heuristic is: if it looks like you're swapping bytes to change network endianness into local machine endianness then it is highly likely that the data comes from an untrusted network client. That of course is far from the whole story; once the taint is applied, we still need to have an analyzer that correctly deduces whether tainted data makes it to a sink that requires untainted data.

Taking a step farther back, I've got to say that this whole disaster should be a wakeup call: why is anyone still writing security-critical infrastructure in languages that lack memory safety at runtime? I'm fine with this infrastructure being written in C or C++, so long as at runtime the consequence of undefined behaviour is termination of the program rather than leaking passwords and private keys. A compiler and standard library are free to make undefined behaviour have whatever behaviour they like, so for security-critical infrastructure, let's have a C/C++ compiler and library that makes undefined behaviour into predictably crashing the process. Somehow C# and Java manage to do just that without an outrageous runtime performance cost, so a C/C++ compiler could do the same. With such a runtime in place, the Heartbleed defect would have been a denial of service attack that calls attention to itself, rather than silently leaking the most valuable private data to whomever asks for it, without so much as even a log file to audit.

To argue that we cannot afford the cost of building such a compiler and using it consistently on security-critical infrastructure is to argue that it would be cheaper to just deal with arbitrarily many more Heartbleeds.

Stay safe out there everyone.

  1. In the case of Heartbleed, a call to memset could be the sink.

Standard and Daylight are different

First things first: I've been getting a number of questions about the "Heartbleed" issue and Coverity's static analyzer; I'll try to publish some links to some good articles before I head home to visit my family for Easter weekend.


A couple weeks ago I had an online meeting with some European colleagues; I showed up in the chat room at what I thought was the agreed-upon time and they did not, which was odd, but whatever, I waited ten minutes and then rescheduled the meeting. It turns out they did the same an hour later. I'm sure you can guess why.

If you have been sent a link to this page, it is to remind you that "Eastern Standard Time" is not defined as "whatever time it is in New York City right now", it is defined as "Eastern Time not adjusted for Daylight Saving Time". Parts of the world in the eastern time zone that do not observe Daylight Saving Time -- Panama, for instance -- stay in Eastern Standard Time all year, so it is an error to assume that Eastern Standard Time and Eastern Time are the same time.

Put another way: Universal Time Coordinated (UTC) is the time in Greenwich not adjusted for British Summer Time, which is what the Brits sensibly call Daylight Saving Time. Eastern Standard Time is always UTC minus five hours, and Eastern Daylight Time is always UTC minus four hours. Eastern Time switches between EST and EDT depending on the time of year.

So when you tell someone that the meeting is at noon on Eastern Standard Time on a day in April, you are saying that the meeting is at noon Jamaican time, not noon New York City time. Since computers do what you tell them, not what you mean, you might find that setting an online intercontinental meeting for a time in "EST" might give you a different time than you think. Your best bet is to state the time in UTC, which is unambiguous.

ATBG: Why UTF-16?

I had a great time speaking at the Los Angeles .NET meetup Monday evening; thanks for the warm welcome from everyone who came out.

Today on the Coverity Development Testing Blog's continuing series Ask The Bug Guys I dive into the history of string representations in C# and Visual Basic to answer the question "why does C# use UTF-16 as the default encoding for strings?"


As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

C# and VB are open sourced

For literally years now the Roslyn team has been considering whether or not to release the C# and VB analyzers as open source projects, and so I was very happy but not particularly surprised to watch on Channel 9 a few minutes ago Anders announce that Roslyn is now available on CodePlex.

What astonished me was that its not just a "reference" license, but a full on liberal Apache 2.0 license. And then to have Miguel announce that Xamarin had already got Roslyn working on linux was gobsmacking.

Believe me, we cloned that repo immediately.

I'm still mulling over the consequences of this awesome announcement; I'm watching Soma discuss Roslyn on Channel 9 right now, and Anders is coming up again soon for a Q&A session. (At 12:10 Pacific Daylight Time, here.)

I am also on a personal level very excited and a little nervous to finally have a product that I spent years of my life working on widely available in source code form. Since I always knew that open sourcing was a possibility I tried to write my portions of it as cleanly and clearly as possible; hopefully I succeeded.

Congratulations to the whole Roslyn team, and thanks for taking this big bold step into the open source world.

ATBG: Reordering optimizations

Last time on the Coverity Development Testing Blog's continuing series Ask The Bug Guys I discussed whether it was a good idea to remove a lock which protects an integer field. My conclusion was that it is not, because the lock prevents many potentially confusing optimizations. This week I follow up on that episode with an example where eliding locks on volatile reads and writes permits a surprising result.


As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

Find a simpler problem

A very common unanswerable question I see on StackOverflow is of the form "my CS homework assignment is to solve problem X and I don't even know how to get started. How do I get started?" That's too vague and unfocussed for a site like StackOverflow, which is for specific technical questions that have specific answers.

My recent post on the similarly vague problem of how to debug small programs has gotten a lot of hits and great comments; thanks all for that. In light of that I thought I might do an irregular series each highlighting some basic problem-solving techniques for beginner programmers, CS students and the like.1 So, how do you get started?
Continue reading

  1. Of course these apply to expert programmers too, but expert programmers often already know these techniques.

ATBG: Can I skip the lock when reading an integer?

Today on the Coverity Development Testing Blog's continuing series Ask The Bug Guys, I answer a question that made its way to me from a Coverity customer: is it a good idea to remove a lock which only protects the read of an integer field? After all, that read is guaranteed to be atomic, so it seems safe. As is usually the case, the situation is unexpectedly complicated!1


UPDATE: A number of commenters have asked if marking the field volatile magically prevents reordering bugs. The specific problem with that proposal is that volatile reads and locks do not have the same semantics. A volatile read can be moved backwards in time with respect to a volatile write, and the x86 processor will actually do so, but a read, volatile or otherwise, cannot be moved backwards in time past the beginning of a lock. The more general problem is: we have a toy example that is greatly simplified from the real code, and therefore we don’t know what invariants the real code relies upon. Trying to deducing whether a real gun is safe by examining a toy gun is a dangerous proposition.

I have posted a follow-up article to address some of these concerns.


As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. Is "usually unexpected" an oxymoron?

High Performance Windows Store Apps

My former coworker on the Roslyn team, Brian Rasmussen, has written High-Performance Windows Store Apps, about professional-quality engineering techniques for writing fluid, high-performance applications. I got a sneak peak at the book during its production; it's going to have great content and look fantastic. I'm looking forward to picking up a copy.

Brian and the editors were kind enough to ask me to write a foreword, which I did gladly. You can check out the foreword and get more information about the book at the Microsoft Press blog.