What is the unchecked keyword good for? Part two

Last time I explained why the designers of C# wanted to have both checked and unchecked arithmetic in C#: unchecked arithmetic is fast and dangerous, checked arithmetic is slightly slower but turns subtle, easy-to-miss mistakes into program-crashing exceptions. It seems clear why there is a “checked” keyword in C#, but since unchecked arithmetic is the default, why is there an “unchecked” keyword?

There are a bunch of reasons; here are the ones that immediately come to mind.

First reason: constant integer arithmetic is always checked by default. This can be irritating. Suppose for example you have some interop code and you wish to create a constant for the E_FAIL:

const int E_FAIL = 0x80004005;

That’s an error because that number is too big to fit into an int. But you might not want to use a uint. You might think well I’ll just say

const int E_FAIL = (int)0x80004005;

But that is also illegal because constant arithmetic conversions are also always checked by default. So we still have a conversion that is going to fail. What you have to do is turn off checked constant arithmetic:

const int E_FAIL = unchecked((int)0x80004005);

Second reason: you might have a block of code in which you want all the arithmetic to be checked, but there is one part — say, the inside of a performance-sensitive loop — where you want to get the maximum speed, and are willing to turn off checked arithmetic there and there alone.

Third reason: C# allows you to change the default to checked arithmetic for non-constant integer math via a compiler flag. If you’ve done so, and you need to turn it back off again on a temporary basis, then you have to use the unchecked keyword.

Fourth reason: the unchecked block can be used as a form of self-documenting code, to say “I am aware that the operation I’m doing here might overflow, and that’s fine with me.” For example, I’ll often write something like:

int GetHashCode()
        int fooCode = this.foo == null ? 0 : this.foo.GetHashCode();
        int barCode = this.bar == null ? 0 : this.bar.GetHashCode();
        return fooCode + 17 * barCode;

The “unchecked” emphasizes to the reader that we fully expect that multiplying and adding hash codes could overflow, and that this is OK; we want to be truncating to 32 bits and we expect that the numbers will be large.

There were a bunch of good comments to the previous post; among the questions posed in those comments were:

What do you think of the compiler switch that changes from unchecked to checked arithmetic as the default?

I’m not a big fan of this approach, for several reasons. First, hardly anyone knows about the switch; there’s a user education problem here. Second, I like it when the text of the program can be understood correctly by the reader without having to know the details of the compilation process. Third, it adds testing burden; now there are two ways that every program can be compiled, and that means that there are more test cases in the test matrix.

The C# team is often faced with problems where they have to balance breaking backwards compatibility with improving a feature, and many times the users advocating for the feature suggest “put in a compiler switch that preserves the backwards compatibility” (or, more rarely “put in a switch that turns on the feature”, which is the safer option.) The C# team has historically been quite resistant to adding more switches. We’re stuck with the “checked” switch now, but I think there’s some regret about that.

Should checked arithmetic have been the default?

I understand why the desire was there to make unchecked arithmetic the default: it’s familiar, it’s faster, a new language is going to be judged in part on benchmarks, and so on. But with hindsight, I would rather that checked arithmetic have been the default, and users be forced to turn it off for precisely those situations where the inner-loop performance is genuinely impacted by this nano-optimization. We have other safety features like array bounds checking on by default; it makes sense to me that arithmetic bounds checking would be on by default as well. But again, we’re stuck with it now.


18 thoughts on “What is the unchecked keyword good for? Part two

  1. Pingback: What is the unchecked keyword good for? Part one | Fabulous adventures in coding

  2. Is there any way to write switch-independent code, i.e., a way to control program flow with a check for the compiler switch (like preprocessor directives)? Assuming not, what are your thoughts on this as an alternative approach to the switch or whether it is even feasible?

  3. I would consider the switch useful if a shop has a policy that code whose correctness would be reliant upon checked or unchecked arithmetic must explicitly specify its requirements. In such a scenario, it may be advantageous to have a lot of arithmetic be unchecked for speed, but be able to recompile with checking enabled in the event that it becomes clear that something is overflowing somewhere but it’s difficult to figure out exactly where.

  4. const int E_FAIL = unchecked((int)0x80004005);

    Have to do this several times a few days ago, still irritating by being way too verbose, anyone typing a constant wich start by “0x” probably knows what he wants.

    • Yeah, I’m not sure if the “probably knows what he wants” is a strong enough justification, but the verbosity has always bugged me too.

      Frankly, it’s never been clear to me why hexadecimal literals have to be unsigned (and hence not fit in an int even when it’s a valid negative 32-bit value). But assuming unsigned as the default makes sense, it seems to me there should have been a type specifier (as in U, M, F, etc.) for the int type (maybe call it S) that I could append to the literal to tell the compiler to interpret the bit pattern as a signed int.

      Maybe Eric can one day write an article on the topic of behaviors of numeric literals, and motivations behind the behaviors of hexadecimal literals in particular. 🙂

      • What should be the expected behavior of `longVar &= ~0x0000000080000000`, which is one of the more common usages for hex literals with bit 31 set? Interestingly, it fails whether the constant is interpreted as Int32 or UInt32; the pattern only works if the value is Int64 or UInt64. If there were an “and-not” operator/method (IMHO a good language should include one, along with methods equivalent to `((x & y) != 0)` and `((x & y)==y)`) then using a `UInt32` mask on an `Int64` or `UInt64` would be fine, but since there isn’t, there’s no “safe” interpretation for hex constants with the high bit set.

        • I’m having a hard time understanding your objection. The example you describe doesn’t “work” today, if I understand the scenario you are considering. I.e. the literal you show is still treated as 32 bits, and so when it is inverted there are no bits 32 through 63 to invert. If e.g. “longVar” is 0xffffffffffffffff before the statement you show, it will be 0x7fffffff after the statement.

          You use the words “fail” and “safe”, but I don’t really understand what you intend for those words to mean. Once a type for the literal is determined, the interpretation of a hexadecimal value with the highest bit set is unambiguous. All we need is a consistent, predictable way to determine the type.

          You ask what the expected behavior should be. I would treat the literal you show exactly the same as 0x80000000. The ~ operator is irrelevant — it’s not part of the literal — and the leading 0’s should be ignored just as in any other literal.

          If you want to be able to get a 64-bit value with bit 31 cleared, you should just write “~0x80000000L” or “~0x80000000UL” (as appropriate to the scenario).

          Now, again…I would prefer that unadorned literals just always be a signed int, hexadecimal or not. But I’m open the idea that the literal could default to being unsigned, even though I don’t understand the value of that (sorry, but your reply didn’t add anything to my comprehension of that issue). Even in that case though, we should have a numeric literal suffix we can apply to force a signed 32-bit value, to avoid the rigmarole of casting and adding “unchecked”.

          • My argument was a reason that 0x80000000 shouldn’t be an `Int32`; I guess it got muddled by the fact that the same argument would imply it shouldn’t be a `UInt32` (though it is). A better argument might be to observe that `Int64 L=0x80000000;` will work identically if the literal is interpreted as `UInt32` or `UInt64`, but differently if it is interpreted as `Int32`. Yes it’s possible to ensure correct behavior by adding a suffix, but it’s better to have code fail to compile when a suffix is omitted than have it yield unintentional behavior.

            I do agree with you 100% that there should be a suffix to denote signed integer hex literals (my choice would probably be something like “sL” or “sl” for Int64, “sw” for Int32, “sh” for Int16, and `sb` for `SignedByte`). I don’t see a reasonable way to support signed hex literals with the MSB set without a length-defining suffix, however.

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1839

  6. What’s saddest about that compiler switch is that my friend and I wanted to turn it on in our public projects, but decided against it on the basis that most C# developers out there expect it to be off 😦

    So the switch can’t even be used realistically, except in projects where it’s important enough to warrant the “developer surprise” factor.

    • IMHO, the proper convention should be that properly written code should satisfy its contract whether the default is checked or unchecked, but it would often be appropriate (or even desirable) for the setting to affect behavior when given invalid inputs. If a function is being given invalid inputs which cause overflows, having it throw an OverflowException may make it easier to debug the problem, but outside of debugging contexts it may be better to eliminate the checks for purposes of performance. Use “checked” when you’re relying upon overflow checking, or “unchecked” when you’re relying on wrapping. If you enable trapping in your projects but other people end up disabling it, your code will still trap important overflows. Further, if people get in the habit of enabling it, others might get in the habit of writing `unchecked` when appropriate.

  7. Pingback: Dew Drop – April 14, 2015 (#1992) | Morning Dew

  8. I ran across the MSDN docs page on Enumerable.Sum(IEnumerable) (https://msdn.microsoft.com/en-us/library/vstudio/bb338442(v=vs.100).aspx) and noticed that it will throw an OverflowException if the sum is greater than the range of Int32.

    Interestingly, it still throws even if you wrap the call in an unchecked block. Why is that? Does the method internally explicitly perform checked arithmetic?

    Also interestingly, Resharper will warn you that you’re invocating Sum() inside an unchecked block.

    • .NET uses different instructions to perform checked and unchecked arithmetic. The `checked` and `unchecked` directives indicate what kind of math instructions the C# compiler should generate within them. They do not affect the kind of instructions generated for math instructions located in other methods.

    • John is of course correct; I note that this is a bit of a design problem in the “checked” and “unchecked” blocks. For one thing, “checked { x = y + z; }” and an extract-method refactoring “checked { x = Add(y, z); }” can have totally different semantics. Compare that to, say, “try”, where you can do an extract-method on the inside of the block and still get exception protection.

      • The effects of `try` do not really extend into a block in .NET (in its predecessor VB6 the effects of some “ON ERROR” statements sometimes did and sometimes didn’t, which made the semantics really bizarre). If the third statement in a block triggers an exception, there’s no way the caller can request that the method ignore the exception and carry on. Once an exception is triggered in a method that doesn’t handle it, control is going to leave that method and return to the caller.

  9. Pingback: My readings in 2015 week 16 | My path to become awesome dev

  10. I’ve been thinking about this some more and I wonder what you would think of the idea (if not in .NET, in the Next Major Framework) of having a cross between checked and unchecked, which added a `ThrowIfOverthrow`; statement or method, and specified that if an overflow occurs, the exception is allowed (but nor required) to be deferred until the next `ThrowIfOverflow;` statement (if any), provided it is thrown in the present method? Given:

    q.n1 += a1.x; // Assume an L3 cache miss on `a1`
    q.n2 += a2.x; // Assume `q` and `a2..a5` present in the L0 cache
    q.n3 += a3.x;
    q.n4 += a4.x;
    q.n5 += a5.x;

    It is possible for the second through fifth statements to be executed before the first when using “unchecked”, but in a checked context the execution of each statement would have to wait until the previous one was complete. If in case of overflow it wouldn’t matter how many statements executed (e.g. because `q` would get abandoned or invalidated during stack unwinding), adding a ThrowIfOverflow after the fifth statement might allow the generated code to use a conditional-load-if-overflow instruction or sequence to keep track of whether overflows occurred, so that the five statements above could all be scheduled indendent of each other; a conditional jump based upon that flag would have to wait until all the preceding operations occurred, but the operations themselves would not be rigidly sequenced.

    What would you think of that as a way of reducing the performance drain from “checked”?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s