What is the unchecked keyword good for? Part one

One of the primary design goals of C# in the early days was to be familiar to C and C++ programmers, while eliminating many of the “gotchas” of C and C++. It is interesting to see what different choices were possible when trying to reduce the dangers of certain idioms while still retaining both familiarity and performance. I thought I’d talk a bit about one of those today, namely, how integer arithmetic works in C#.

One of the biggest problems with C and C++ is that you don’t even know for sure what the range is of an integer; every compiler can differ, which makes it tricky to write truly portable code. C# solves this problem by simply removing the ambiguity. In C#, an int is a 32 bit twos-complement, number, end of story. So that problem is solved. But many more remain.

The fundamental problem is that integer arithmetic in C, C++ and C# behaves only superficially like the integer arithmetic you learned in school. In normal arithmetic there are nice properties like “adding two positive numbers results in a third positive number”, which do not hold in these languages because of integer overflow. Even the property that there is no highest integer is a very useful mathematical property that does not hold. (A mathematician would note that the integer arithmetic that we have in C# is a commutative ring, but few developers have studied ring theory.)

This is bad because it leads to bugs. It is good because the vast majority of integer arithmetic done in any of these languages involves integers whose magnitudes are tiny compared to the possible range of the integer type, and because this kind of arithmetic can be done extremely quickly by computers. So then question then for the designers of C# is: how do we keep the desirable high performance while still enabling developers to detect and prevent bugs?

Of course, one choice would be to simply reject the premise that speed is the most important thing, and make math work correctly across the board. A “big integer” could be the default integer type, as it is in some other languages. Frankly, I spend billions of nanoseconds waiting for stuff to stream down from the network every day; I don’t really care if my arithmetic takes a few extra nanoseconds. It might be worthwhile to say that the default type is big integers, and if you want high performance integers, then you have to use a special type.

But when C# was developed, I doubt that this was even considered for a moment. Keeping the performance up, and being able to interface easily with existing libraries of unmanaged code, and leveraging the existing knowledge of developers used to 32 bit integers, were all high priorities. And we lived in a world where high latency was due mostly to the CPU taking a long time, not waiting for network I/O so much.

The decision the C# designers actually made was to have two kinds of integer arithmetic: checked, and unchecked. Unchecked arithmetic has all the speed and danger that you have learned to love from C and C++, and checked arithmetic throws an exception on overflow. It is slightly slower, but a lot safer.

How exactly is that safer? Because it is better to crash the program and bring the bug to the attention of the users and the developers, than to muddle on through. If the programmer believed that all arithmetic operations should be on integers that are tiny compared to the range of an integer, and in fact they are not, then something is fundamentally wrong with the program, it is unreliable, and it should be stopped before it does any more harm.

So it seems pretty clear why we would want to have a “checked” keyword in C#; it says “in this bit of the code, I assert that my integer arithmetic does not overflow, and if it does, I’d rather crash the program than muddle on through”. But what possible use is the unchecked keyword? Unchecked arithmetic is the default!

Next time on FAIC, we’ll answer that question.

Advertisements

33 thoughts on “What is the unchecked keyword good for? Part one

  1. Great timing on this post.

    I have just written about (ab)using the overflow behaviour of integers on purpose to represent values in domains that naturally exhibit modular arithmetic (in my example directions/angles in the 2D plane).

    While I agree with your statement that this behaviour of integers is dangerous, it can be a useful ‘feature’, given that you get a very specific kind of behaviour without any extra code.

    Though, one probably should to be very careful with doing things like this.

    Looking forward to the second post!

  2. Wait โ€“ unchecked is the default? I thought it was the opposite! I do a lot of hardware interfacing and often have a need for unchecked behavior. When I do, I always enclose that code in an “unchecked” block. So basically, this has been a no-op, and I should have instead been enclosing my other code in “checked”?

    • Code can be compiled with checked or unchecked as the default. If code may or may not be compiled with “unchecked” as the default, then an “unchecked” declaration may or may not be “necessary”. It is often better to include text in the source code which will do something “when needed”, and have no effect when not needed, then to omit such text at times when it’s not *presently* needed.

  3. It is notable that a checked addition only inserts one addition instruction that is actually executed. That instruction is an easily predicted branch that is not in the critical path of anything.

  4. I often use unchecked when I expect overflow to document the fact that overflow is being expected. It also enables me to later change compiler settings to make checked the default.

    Eric, since this is a post series I’d like to read what you think about language options such as the checked compiler setting. This is the same principle that the VB language options use (Option Strict and so on) and these are arguably not a good thing.

  5. > Unchecked arithmetic is the default!

    I believe that was a mistake. Checked should have been the default and unchecked should have been used only in the 3% of code where micro-optimization makes sense.

    The way it is, almost nobody uses checked in C#, so overflow bugs are still easy to write.

    • True, and the JIT should optimize away the overflow check for the induction variable in loops.

      Now given the low level of optimizations performed by the current and the next JIT this is not going to happen.

      checked should not only have been the default but always on except in unchecked contexts. The compiler switch is not necessary.

      On the other hand when .NET was introduced compatibility with Java was still a concern. There was the idea that people would port from Java to C# (not disproved). I believe that’s how variant arrays came to be. And unchecked by default breaks Java code.

      • I wonder if compatibility of Java is the reason for the rules surrounding float/double arithmetic and promotions? The designers of Java seem to have stroven for simplicity without considering the consequences of some of their rules (e.g. passing an `int` or `long` to a method with overloads for `float` and `double` will choose the former; the one saving grace for .NET in that regard is that most .NET methods which include both `float` and `double` overloads also include `Decimal`, and passing an `int` or `long` without casting generates a compiler error rather than silently coercing to `float`) and I think it unfortunate that .NET followed their lead. A desire for Java compatibility could explain the decision, though.

  6. IMHO, there are three kinds of “integers”:

    1. Those which programmers expect to behave as numbers, and where programmers want the compiler to trap in cases where they can’t, and are willing to pay for such trapping.

    2. Those which programmers expect to behave as numbers, and where programmers would like the compiler to trap in cases where they can’t, but may be averse to the cost of such trapping.

    3. Those which programmers need to have behave as algebraic ring members.

    The existence of explicit “checked” and “unchecked” blocks allows programmers to distinguish them somewhat (things marked as “checked” are the first type, those which are “unchecked” are the third, and those which aren’t marked are the second), though the actual distinction should go beyond trapping. For example, if `x` is 32-bit variables of the first type equal to 2000000000 and a programmer writes `Int64 L = x*2;`, the expectation that the variables should behave as numbers would be satisfied by having the program trap, but it would be satisfied just as well (if not better) by having the program store 4000000000 into `L`. Even if a programmer wouldn’t be willing to have the computer spend more time performing the computation as `Int64`, on a 64-bit machine where performing the math as 64 bits would be faster than adding a 32-bit overflow check, a programmer shouldn’t mind if the compiler does the former.

    If I were designing a language, or had a chance to influence the C standard, I would include distinct types for numbers and algebraic rings and require that code which converts rings to numbers or *larger* rings must explicitly indicate the intended behavior. Given `unum32_t a=1,b=2;`, the meaning of `int64_t c=a-b;` would be clear (it should assign -1), but given `uwrap32_t e=1,f=2;` a compiler should squawk at `int64_t g=e-f;` and require that the programmer specify either `int64_t g=(unum_32)(e-f);`, `int64_t g=(int32_t)(e-f)`, or `int64 g=(unum32_t)e-(unum32_t)f;` depending upon the actual semantics desired. Presently, there’s no way a 64-bit compiler could have computations promote to 64-bit values without breaking a lot of 32-bit code in ways that would be hard to track down, but if code could indicate whether it *wants* arithmetic clipped to 32-bits or was merely willing to tolerate it, most 32-bit code could be made to run on 64-bit systems merely by changing variable declarations and possibly adding some #pragma directives to indicate how things like numeric literals should be regarded. What would you think of those ideas?

    • First off, I broadly agree with the general thrust of your ideas here. I would add a couple of things.

      First, there is a fourth use of integers, namely use as bit arrays. I recommend that in modern languages people use a dedicated bit array type rather than bit twiddling an integer.

      Second, I agree that accidental late promotion is a common bug. At Coverity we recently adapted the C++ checker which looks for accidental late promotion to work on C# code as well. It is a surprisingly common bug to see something like ” someLong = someInt * someInt” where the computation is assigned to a long because it might overflow. However it is a tricky problem to weed out the false positives. We often see things like “milliseconds = seconds * 1000” where milliseconds is a long, and seconds is an int, but there is no chance in practical code that “seconds” is going to be more than two billion.

      • I would consider a bit array to be a form of modular algebraic ring with some added convenience operators. When working with unsigned values, “X |= 8;” is equivalent to “X = X – (X % 16) + 8 + (X % 8);”,

        As for your second point, is there any reason the latter code shouldn’t be written as either “milliseconds = (Int32)(seconds * 1000);” or “milliseconds = seconds * 1000L;` (depending upon programmer intent)? Imposing such a requirement by default would be a breaking change, but having a mode which could be disabled for existing code but would squawk at your example in new code would help prevent new bugs going forward. More generally, I would like to see a mode which allows values (variables, fields, parameters, constants, function returns, etc.) to be tagged with an indication of their intended usage, and applies type coercion rules based whether the meaning of the code in question would match the intention.

        For example, given two `float` constants named “OneTenth” and “BestFloatRepresentationOfOneTenth”, both of which equal “0.1f”, it would seem helpful to have the way to tag things such that the compiler would accept “double d1=BestFloatRepresentationOfOneTenth;” but reject “double d2=OneTenth;” From the CLR perspective, both variables would hold the precise fraction 13421773/134217728, but one should be tagged to indicate that it holds a precise value and the other to indicate that it holds a rough approximation of the value the programmer really wanted.

  7. It’s good to have you back!

    I’ve actually used ‘unchecked’ once… When wanting to specify that that piece of code is ok with overflow arithmetic, and should be compiled unchecked regardless of compiler switches, or even if it would be copied into ‘checked’ context (default or not, it should be possible to turn off ‘checked’ context when needed)

  8. If only more people were believers in “it’s better to crash the program”, like Eric… Entire programming environments have been built around the concept that “it’s better to limp along and never tell the user something went wrong”. I’m looking at you, JavaScript in web browsers.

    I suppose it used to make sense when JavaScript was intended for some minor *OPTIONAL* improvements, but now that entire systems are built in JavaScript, you click a button and nothing happens, how exactly is that better than when you click a button and an error message pops up? It’s not better in any way whatsoever, that’s how.

    Admittedly, giving the user the *option* to ignore further errors and proceed seems like a middle ground that’s better than both of the extremes…

    • Well, don’t forget that I was also one of the implementers of JavaScript and on the design committee, briefly.

      I agree with your critique, but you have to look at the historical perspective.

      As I have often said, we designed JS to make the monkey dance when you move the mouse over it; it was never intended to be an implementation language for programming-in-the-large by teams of professionals. Our (justified) belief was that programs would be short, that they would be written by amateurs or semi-professionals, and that the browser ecosystem was a highly heterogeneous mixture of buggy browsers from multiple vendors. “Muddle on through” is a pretty sensible design principle for that scenario. It is not sensible for how JS is used today of course.

      • Isn’t it funny how often programming projects that are intended to be short-lived hacks take on a life of their own, and how often resistance to fix problems that become apparent early on for fear of breaking a handful of programs become enshrined forevermore? C, make, JavaScript, many aspects of DOS, etc.

    • That seems to be the purpose of checked and unchecked specifiers but having more like it would be nice in a language. Off the top of my head at least half of my personal library works best with “limp along” route due to the nature of the functions within it. However there’s still the other half where errors need to be handled with great care and the user must be notified that they occurred as well as bug reports generated. try-catch-finally is a nice way to handle those currently but when your using catches to sink exceptions you have to be very careful not to sink them all.

  9. Interesting topic – one that’s quite close to my heart, having written checked math operators for Ada projects on 680×0 processors – every clock cycle was precious… And of course, Ada is a language that has different fundamental types for twos complement integers vs rings (‘mod’ types).

    And I presume you’re aware of Haskell, where the default integer type is a bignum (and one of the first optimisations is to change ‘Integer’ to ‘Int’ to get fixed size integers of at least 30 bits).

    • I wonder if it might be possible to modernize ADA; it seems to have some good concepts, but it doesn’t get much attention nowadays.

      • It’s still being updated – the last update was in 2012. It’s always been aimed at the real-time/embedded domain, especially safety-critical software development (that’s where I encountered it), but has lost out heavily to C there purely because (in my opinion) it’s different to C and the correctness benefits you can get from using Ada instead of C don’t appear to be deemed worth the extra effort required to get developers capable of using Ada. It’s a shame in a lot of ways, as it is a nice language to use, especially for real-time and embedded systems.

        • Unfortunately, the state of the art in C compilers seems headed away from the idea that C code whose behavior wasn’t defined by standards but was implicitly defined by the way an platform worked should be mutated by the compiler into a bizarro-land version. For example, passing an even number to “void foo(int p1) { int p2; if (p1 & 1) p2 = bar(p1); boz(p1,p2); }” would, with many compilers, cause “boz” to be given the passed value along with an indeterminate one. If “boz” only looks at its the second parameter whenever its first parameter is odd, the indeterminate value would have no effect; if the compiler had no way of knowing that “boz” would ignore its second parameter in such cases, passing an indeterminate value may save an instruction compared with passing an initialized one, and the ability to save that instruction was a big part of why the initialization of auto-variables was left unspecified in the first place.

          Some of today’s compilers, however, would change the code into “void foo(int p1) { boz(p1, bar(p1));”, even if “bar” had side-effects, on the the compiler is allowed to do anything it wants when p is even since the code engages in “undefined behavior”, with the net effect that code which engaged in Undefined Behavior, but would either work or be recognizably trapped, gets mutated into code which invokes “bar” with even values–behavior which could only be justified by saying that even behavior which many platforms might define as “benign” justifies random code execution.

          From a correctness standpoint, there would be nothing wrong with having a language refuse compilation if any code path would access an uninitialized variable, or even with having such a variable access trapped at runtime. The idea that safety-critical code might be written in a language where compilers try to find excuses for ignoring what the programmer wrote, however, is frightening.

  10. “But what possible use is the unchecked keyword? Unchecked arithmetic is the default!”

    I’d say because you can set the opposite behavior as the default (csc.exe app.cs /checked+) and then if you need to “relax” in some portions of the code you use unchecked. Which may hold if the compiler directive was introduced since version 1 (dunno).

    In hindsight given that we are talking C# I shouldn’t be surprise by the fact that you can nest these contexts and the results made total sense:

    https://gist.github.com/hnh12358/42bee7e9903c163964ab [1]

    One thing that I notice, is that given that every method scope (including lambdas) gets the default checking, one way to opt out of the explicit scope for a certain operation without using the keyword will be to move out that operation to its own function. With lambdas you could have a generic function to do that [2]. But alas, why would you ever do that? ๐Ÿ˜‰

    I tried forcing the inlining of a function but it seems the checkings are made in a way that inlining doesn’t break it.

    Thinking about this, and the notion of a safe context for numeric operations, did this ever brought confusion with the unsafe/safe directive?

    Looking forward the second part!

    • From what I understand, the behavior of “checked” and “unchecked” is equivalent to the language having two different versions of each operator, including conversion operators, and binding to one set or the other depending upon checked/unchecked context (actually, I think it might be sorta neat if a language used separate operators for unchecked, so `x = y |+| z;` would indicate that code was explicitly expecting the computation to wrap. Don’t know of any that does, though).

  11. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1837

  12. Pingback: Dew Drop – April 10, 2015 (#1991) | Morning Dew

  13. I too look forward to part two.

    My own primary use of the “unchecked” keyword is for compile-time literals or computations, as these are “checked” by default (as opposed to the run-time default of “unchecked”). It doesn’t come up often, but is convenient in e.g. interop scenarios where one has a hex literal representing a negative 32-bit integer, or one is specifically looking for the overflowed result of a computation but wants to write the computation out for readability in the code.

    And of course, one can change the default “checked” vs “unchecked” behavior with a compiler switch (i.e. make the run-time default “checked”). Then one would need the “unchecked” keyword to allow for unchecked computations when desired.

    I would not be surprised if the answer to “what possible use is the unchecked keyword?” involves far more complexity than the above and I look forward to learning something new. ๐Ÿ™‚

  14. Pingback: Les liens de la semaine โ€“ ร‰dition #127 | French Coding

  15. Pingback: What is the unchecked keyword good for? Part two | Fabulous adventures in coding

  16. Pingback: Fabulous adventures in coding

  17. Pingback: Long division | Fabulous adventures in coding

  18. How expensive is checked arithmetic? Let’s say you’re using XNA and so far performance has been good enough that you’re running in Debug rather than Release mode. Is it safe to say that chances are you might as well turn on checked arithmetic, or is there a real chance it will ruin everything? My intuition tells me that the cost can’t be much worse than a factor of 2 or 3, but the usual adage is to not trust intuition in such things.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s