What is the unchecked keyword good for? Part two

Last time I explained why the designers of C# wanted to have both checked and unchecked arithmetic in C#: unchecked arithmetic is fast and dangerous, checked arithmetic is slightly slower but turns subtle, easy-to-miss mistakes into program-crashing exceptions. It seems clear why there is a “checked” keyword in C#, but since unchecked arithmetic is the default, why is there an “unchecked” keyword?

There are a bunch of reasons; here are the ones that immediately come to mind.

First reason: constant integer arithmetic is always checked by default. This can be irritating. Suppose for example you have some interop code and you wish to create a constant for the E_FAIL:

const int E_FAIL = 0x80004005;

That’s an error because that number is too big to fit into an int. But you might not want to use a uint. You might think well I’ll just say

const int E_FAIL = (int)0x80004005;

But that is also illegal because constant arithmetic conversions are also always checked by default. So we still have a conversion that is going to fail. What you have to do is turn off checked constant arithmetic:

const int E_FAIL = unchecked((int)0x80004005);

Second reason: you might have a block of code in which you want all the arithmetic to be checked, but there is one part — say, the inside of a performance-sensitive loop — where you want to get the maximum speed, and are willing to turn off checked arithmetic there and there alone.

Third reason: C# allows you to change the default to checked arithmetic for non-constant integer math via a compiler flag. If you’ve done so, and you need to turn it back off again on a temporary basis, then you have to use the unchecked keyword.

Fourth reason: the unchecked block can be used as a form of self-documenting code, to say “I am aware that the operation I’m doing here might overflow, and that’s fine with me.” For example, I’ll often write something like:

int GetHashCode()
{
    unchecked 
    {
        int fooCode = this.foo == null ? 0 : this.foo.GetHashCode();
        int barCode = this.bar == null ? 0 : this.bar.GetHashCode();
        return fooCode + 17 * barCode;
    }
}

The “unchecked” emphasizes to the reader that we fully expect that multiplying and adding hash codes could overflow, and that this is OK; we want to be truncating to 32 bits and we expect that the numbers will be large.


There were a bunch of good comments to the previous post; among the questions posed in those comments were:

What do you think of the compiler switch that changes from unchecked to checked arithmetic as the default?

I’m not a big fan of this approach, for several reasons. First, hardly anyone knows about the switch; there’s a user education problem here. Second, I like it when the text of the program can be understood correctly by the reader without having to know the details of the compilation process. Third, it adds testing burden; now there are two ways that every program can be compiled, and that means that there are more test cases in the test matrix.

The C# team is often faced with problems where they have to balance breaking backwards compatibility with improving a feature, and many times the users advocating for the feature suggest “put in a compiler switch that preserves the backwards compatibility” (or, more rarely “put in a switch that turns on the feature”, which is the safer option.) The C# team has historically been quite resistant to adding more switches. We’re stuck with the “checked” switch now, but I think there’s some regret about that.

Should checked arithmetic have been the default?

I understand why the desire was there to make unchecked arithmetic the default: it’s familiar, it’s faster, a new language is going to be judged in part on benchmarks, and so on. But with hindsight, I would rather that checked arithmetic have been the default, and users be forced to turn it off for precisely those situations where the inner-loop performance is genuinely impacted by this nano-optimization. We have other safety features like array bounds checking on by default; it makes sense to me that arithmetic bounds checking would be on by default as well. But again, we’re stuck with it now.

Advertisements

23 thoughts on “What is the unchecked keyword good for? Part two

  1. Pingback: What is the unchecked keyword good for? Part one | Fabulous adventures in coding

  2. Is there any way to write switch-independent code, i.e., a way to control program flow with a check for the compiler switch (like preprocessor directives)? Assuming not, what are your thoughts on this as an alternative approach to the switch or whether it is even feasible?

  3. I would consider the switch useful if a shop has a policy that code whose correctness would be reliant upon checked or unchecked arithmetic must explicitly specify its requirements. In such a scenario, it may be advantageous to have a lot of arithmetic be unchecked for speed, but be able to recompile with checking enabled in the event that it becomes clear that something is overflowing somewhere but it’s difficult to figure out exactly where.

  4. const int E_FAIL = unchecked((int)0x80004005);

    Have to do this several times a few days ago, still irritating by being way too verbose, anyone typing a constant wich start by “0x” probably knows what he wants.

    • Yeah, I’m not sure if the “probably knows what he wants” is a strong enough justification, but the verbosity has always bugged me too.

      Frankly, it’s never been clear to me why hexadecimal literals have to be unsigned (and hence not fit in an int even when it’s a valid negative 32-bit value). But assuming unsigned as the default makes sense, it seems to me there should have been a type specifier (as in U, M, F, etc.) for the int type (maybe call it S) that I could append to the literal to tell the compiler to interpret the bit pattern as a signed int.

      Maybe Eric can one day write an article on the topic of behaviors of numeric literals, and motivations behind the behaviors of hexadecimal literals in particular. 🙂

      • What should be the expected behavior of `longVar &= ~0x0000000080000000`, which is one of the more common usages for hex literals with bit 31 set? Interestingly, it fails whether the constant is interpreted as Int32 or UInt32; the pattern only works if the value is Int64 or UInt64. If there were an “and-not” operator/method (IMHO a good language should include one, along with methods equivalent to `((x & y) != 0)` and `((x & y)==y)`) then using a `UInt32` mask on an `Int64` or `UInt64` would be fine, but since there isn’t, there’s no “safe” interpretation for hex constants with the high bit set.

        • I’m having a hard time understanding your objection. The example you describe doesn’t “work” today, if I understand the scenario you are considering. I.e. the literal you show is still treated as 32 bits, and so when it is inverted there are no bits 32 through 63 to invert. If e.g. “longVar” is 0xffffffffffffffff before the statement you show, it will be 0x7fffffff after the statement.

          You use the words “fail” and “safe”, but I don’t really understand what you intend for those words to mean. Once a type for the literal is determined, the interpretation of a hexadecimal value with the highest bit set is unambiguous. All we need is a consistent, predictable way to determine the type.

          You ask what the expected behavior should be. I would treat the literal you show exactly the same as 0x80000000. The ~ operator is irrelevant — it’s not part of the literal — and the leading 0’s should be ignored just as in any other literal.

          If you want to be able to get a 64-bit value with bit 31 cleared, you should just write “~0x80000000L” or “~0x80000000UL” (as appropriate to the scenario).

          Now, again…I would prefer that unadorned literals just always be a signed int, hexadecimal or not. But I’m open the idea that the literal could default to being unsigned, even though I don’t understand the value of that (sorry, but your reply didn’t add anything to my comprehension of that issue). Even in that case though, we should have a numeric literal suffix we can apply to force a signed 32-bit value, to avoid the rigmarole of casting and adding “unchecked”.

          • My argument was a reason that 0x80000000 shouldn’t be an `Int32`; I guess it got muddled by the fact that the same argument would imply it shouldn’t be a `UInt32` (though it is). A better argument might be to observe that `Int64 L=0x80000000;` will work identically if the literal is interpreted as `UInt32` or `UInt64`, but differently if it is interpreted as `Int32`. Yes it’s possible to ensure correct behavior by adding a suffix, but it’s better to have code fail to compile when a suffix is omitted than have it yield unintentional behavior.

            I do agree with you 100% that there should be a suffix to denote signed integer hex literals (my choice would probably be something like “sL” or “sl” for Int64, “sw” for Int32, “sh” for Int16, and `sb` for `SignedByte`). I don’t see a reasonable way to support signed hex literals with the MSB set without a length-defining suffix, however.

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1839

  6. What’s saddest about that compiler switch is that my friend and I wanted to turn it on in our public projects, but decided against it on the basis that most C# developers out there expect it to be off 😦

    So the switch can’t even be used realistically, except in projects where it’s important enough to warrant the “developer surprise” factor.

    • IMHO, the proper convention should be that properly written code should satisfy its contract whether the default is checked or unchecked, but it would often be appropriate (or even desirable) for the setting to affect behavior when given invalid inputs. If a function is being given invalid inputs which cause overflows, having it throw an OverflowException may make it easier to debug the problem, but outside of debugging contexts it may be better to eliminate the checks for purposes of performance. Use “checked” when you’re relying upon overflow checking, or “unchecked” when you’re relying on wrapping. If you enable trapping in your projects but other people end up disabling it, your code will still trap important overflows. Further, if people get in the habit of enabling it, others might get in the habit of writing `unchecked` when appropriate.

  7. Pingback: Dew Drop – April 14, 2015 (#1992) | Morning Dew

  8. I ran across the MSDN docs page on Enumerable.Sum(IEnumerable) (https://msdn.microsoft.com/en-us/library/vstudio/bb338442(v=vs.100).aspx) and noticed that it will throw an OverflowException if the sum is greater than the range of Int32.

    Interestingly, it still throws even if you wrap the call in an unchecked block. Why is that? Does the method internally explicitly perform checked arithmetic?

    Also interestingly, Resharper will warn you that you’re invocating Sum() inside an unchecked block.

    • .NET uses different instructions to perform checked and unchecked arithmetic. The `checked` and `unchecked` directives indicate what kind of math instructions the C# compiler should generate within them. They do not affect the kind of instructions generated for math instructions located in other methods.

    • John is of course correct; I note that this is a bit of a design problem in the “checked” and “unchecked” blocks. For one thing, “checked { x = y + z; }” and an extract-method refactoring “checked { x = Add(y, z); }” can have totally different semantics. Compare that to, say, “try”, where you can do an extract-method on the inside of the block and still get exception protection.

      • The effects of `try` do not really extend into a block in .NET (in its predecessor VB6 the effects of some “ON ERROR” statements sometimes did and sometimes didn’t, which made the semantics really bizarre). If the third statement in a block triggers an exception, there’s no way the caller can request that the method ignore the exception and carry on. Once an exception is triggered in a method that doesn’t handle it, control is going to leave that method and return to the caller.

  9. Pingback: My readings in 2015 week 16 | My path to become awesome dev

  10. I’ve been thinking about this some more and I wonder what you would think of the idea (if not in .NET, in the Next Major Framework) of having a cross between checked and unchecked, which added a `ThrowIfOverthrow`; statement or method, and specified that if an overflow occurs, the exception is allowed (but nor required) to be deferred until the next `ThrowIfOverflow;` statement (if any), provided it is thrown in the present method? Given:

    q.n1 += a1.x; // Assume an L3 cache miss on `a1`
    q.n2 += a2.x; // Assume `q` and `a2..a5` present in the L0 cache
    q.n3 += a3.x;
    q.n4 += a4.x;
    q.n5 += a5.x;

    It is possible for the second through fifth statements to be executed before the first when using “unchecked”, but in a checked context the execution of each statement would have to wait until the previous one was complete. If in case of overflow it wouldn’t matter how many statements executed (e.g. because `q` would get abandoned or invalidated during stack unwinding), adding a ThrowIfOverflow after the fifth statement might allow the generated code to use a conditional-load-if-overflow instruction or sequence to keep track of whether overflows occurred, so that the five statements above could all be scheduled indendent of each other; a conditional jump based upon that flag would have to wait until all the preceding operations occurred, but the operations themselves would not be rigidly sequenced.

    What would you think of that as a way of reducing the performance drain from “checked”?

  11. As a relative newcomer to C# compilation, I’m confused by the E_FAIL example. When you use the `unchecked` keyword, what value is actually stored in the E_FAIL constant? If it’s the stated value, doesn’t that cause a memory leak? Is it truncated? Or is it x such that `(x – int.MinValue) + int.MaxValue` is equal to the value? If it’s either of the latter two, that seems like a careless way to track a constant value, given the possibility of multiple integers from the other side of the interop mapping to the same constant value in your code. Of course, I may have missed the point entirely.

    • “When you use the `unchecked` keyword, what value is actually stored in the E_FAIL constant?”

      The value that is stored is the literal value specified in the code (possibly truncated to fit, but in the example given this isn’t needed, nor happens…the stated example has 32 bit of information, and the const is 32 bits in size, so it’s fine).

      “If it’s the stated value, doesn’t that cause a memory leak?”

      Where would a memory leak — i.e. a block of memory that is no longer used, but which remains allocated instead of being returned to the available memory pool — come from? We’re talking about a single 32-bit value here, and no need to allocate memory in any case.

      “Is it truncated?”

      No, it’s not truncated. There is no reason to. The literal specified in the example (and indeed, in any similar situation) is a perfectly reasonable 32-bit value.

      “Or is it x such that `(x – int.MinValue) + int.MaxValue` is equal to the value?”

      The expression “(x – int.MinValue) + int.MaxValue” will never equal “x”. The value of “int.MinValue” is 0x80000000 and the value of “int.MaxValue” is 0x7fffffff. The value of “(x – int.MinValue) + int.MaxValue” is always exactly “x – 1”, even when constrained to the 32-bit data size.

      Since “-int.MinValue + int.MaxValue” is mathematically “-1”, it’d obviously be “x – 1” as long as the data size allowed for the computation without overflow. But even if there’s overflow, the net effect is still “x – 1”. Even if “x” is “int.MinValue”, the value of “x – 1” is “int.MaxValue”, and the value of “(x – int.MinValue) + int.MaxValue” is also “int.MaxValue”. I.e. the latter is still equal to the former.

      More broadly, using 2’s complement arithmetic, the standard form of arithmetic used for computations with integers on most computer architectures, including any that would be relevant for .NET, the math follows all the usual rules for commutativity and associativity. You may get overflow, but the lower 32 bits will always still have their correct and expected value.

      “given the possibility of multiple integers from the other side of the interop mapping to the same constant value in your code”

      There is no chance of that. The bit representation of the value is the same no matter what. The reason the compiler requires the “unchecked” is for exactly the reason Eric’s explained in his article: the value 0x80004005 (decimal 2,147,500,037) is interpreted by the compiler as an unsigned, positive value, putting it outside the range of valid values for a signed 32-bit integer (which has a maximum of 2,147,483,647). Since the compiler can verify this overflow at compile time due to the constant expression, the “unchecked” context is required for that operation.

      The language *could* have been designed to allow the literal to be interpreted as a signed 32-bit value. In that case, its value would still be 0x80004005; when interpreted as a decimal value, that’s -2,147,467,259, which is larger than -2,147,483,648 (the minimum value for a 32-bit integer), and so would have been a legal assignment without any casting.

      But designing the language that way would introduce ambiguity to literals that could lead to unexpected behavior. E.g. if assigning 0x80004005 to a “long” variable, should the compiler treat the literal as a signed int, requiring sign extension to 0xffffffff80004005, or should it treat the literal as an unsigned int, making the assigned value 0x0000000080004005? The former would allow assignments to signed “int” variables without unchecked casting, but would require the literal to be interpreted based on the destination storage (if any) so that assignments to unsigned “int” variables work as expected (i.e. would result in the latter value, 0x0000000080004005).

      The way things are now, a numeric literal has a specific, simple definition that can be applied without having to inspect any of the code around the literal. This makes the compiler simpler, and also means the language is more predictable. It’s less convenient in some specific examples, but always in an understandable, predictable way. This is in my opinion much better than being more convenient, but less understandable and predictable (C++ is full of “convenient” syntaxes, which are hard to understand and which often lead to bugs…I myself prefer the C# design approach, but it’s clear that there’s room for multiple philosophies).

      • Thanks for taking the time to explain (and so thoroughly). I think my misunderstanding boils down to this:

        Eric says:
        > That’s an error because that number is too big to fit into an int.

        You say:
        > the stated example has 32 bit of information, and the const is 32 bits in size, so it’s fine.

        My understanding of low-level memory is that for a 32-bit int, 32 bits of memory are allocated to store it. If the developer tries to store a number larger than 32 bits, a naive system will allow the number to overflow to adjoining bits, thus corrupting another (arbitrary) piece of data (I guess this isn’t called a “memory leak.” That’s my mistake.) I was working on a couple of assumptions here:

        1) C# does its utmost to make sure that this can never happen, using both compile-time and runtime checks.
        2) The `unchecked` keyword, since it allows a number that’s “too big” to be stored as an int, must be disabling those checks.
        3) At that point, either we’re dealing with corrupted memory (what I incorrectly called a memory leak) or a fallback method of containing data in a type, such as truncation. And my other guess was integer overflow, where values above int.MaxValue roll over to int.MinValue and start counting up from there (this may also be incorrect, but it was the way I understood it.)

        I’ll have to do some more studying before I can fully parse your answer, but thanks again for being so thorough. Hopefully that will provide the fuel I need to figure out what’s going on here.

        • > “If the developer tries to store a number larger than 32 bits, a naive system will allow the number to overflow to adjoining bits, thus corrupting another (arbitrary) piece of data”

          I can’t rule out that there’s some system out there that would do that, but I can’t think of one. For sure, C-like languages will never do this, including C#. If the target storage in an assignment is declared as 32-bits in size, then that’s what’s written. Excess bits are discarded, always. This is known as “overflow”, and represents a loss of the data you’re working with, but not corruption of any other data. In a computation, overflow can be detected (e.g. by looking at the “carry bit”), but this still will not write over adjacent memory. In a simple assignment, you just lose the excess data.

          >”C# does its utmost to make sure that this can never happen”

          What you think can happen, can’t ever no matter what. So, I guess, yes…C# makes sure that can never happen. But it goes beyond that. What C# does in the default case is to guarantee that, if at compile time it’s able (by virtue of dealing with constant expressions) to detect an expression or assignment would overflow (thus losing data, but *not* corrupting adjacent data), an error is emitted. Unless you use “unchecked”. Even using “unchecked” though, all that happens is that you *potentially* lose data. You won’t corrupt any other data.

          And in the case being discussed, the only reason the compiler *thinks* there’s overflow is because it’s interpreting the literal as an unsigned int, the value is being assigned to a signed int, and the value is too large for a signed int. But, in fact, the only thing that actually matters for error codes like that is the binary representation. Casting to the signed int type causes the value to be reinterpreted as that type, but it never had more than 32 bits of information in the first place, so no actual information is lost. It’s just interpreted as a different value. You could cast such a value back and forth between “uint” and “int” all day long and never lose any information. It’s all just the same 32 bits. The reason you need to be explicit about it is that the cast causes a reinterpretation of those bits, where the two different interpretations are in fact two different numbers. If you had some other bit pattern, where in both representations the value is identical (e.g. 256 is hex 0x00000010, but that bit pattern is still 256 whether you interpret it as “uint” or “int), the compiler wouldn’t complain at all.

          In all of this, it is useful to keep in mind that for the most part, all a computer ever really understands is patterns of bits. It has some special handling for certain kinds of patterns of bits — i.e. it does know numeric types like integers and floating point. But it only uses that special knowledge when it needs to do math. If you’re just moving those values around, the CPU generally doesn’t care…it’s just copying bits. It’s the compiler that is imposing an interpretation on the values, and adding rules to make sure those interpretations always match what the programmer expects.

          >”my other guess was integer overflow, where values above int.MaxValue roll over to int.MinValue and start counting up from there ”

          Another thing that is important to understand is that the “roll over” behavior is a natural consequence of using 2’s complement representation for integers. int.MaxValue is represented as 0x7fffffff. If you ignore the sign of a number and add 1 to that value, you get 0x80000000. This just happens to be int.MinValue, once you are taking into account the representation. More generally, you can add *any* value to any other value and get the correct binary representation, regardless of the sign or magnitude of either operand. The worst that can happen is that the result does in fact overflow, in which case this amounts to the most-significant-bit in both operands being 1, causing a carry (this is just like carrying when you add two decimal numbers, except that the carry can only ever be 1). Hence the “carry bit” where this is stored.

          Note also that while the compiler will emit an error if you overflow a constant expression, at runtime it does not *unless* you explicitly opt in with the “checked” keyword, or by changing the default runtime behavior with the /checked compiler option. This is the trade-off the compiler makes between correctness and speed; overflow checking adds a significant cost to runtime computations so the default is for that to be turned off.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s