Math is hard; let’s go shopping

Posted on February 21, 2014 by ericlippert

The nice people at DevProConnections asked me to write a beginner-level article for them about the perils of arithmetic in C#; of course these same perils apply to C, C++, Java and many, many other languages. Check it out!

16 thoughts on “Math is hard; let’s go shopping”

MR on February 21, 2014 at 9:02 am said:

Thanks for posting this Eric! My sister who just started programming in uni already ran into to multiple of these and will surely appreciate this blog. Perhaps it’s worth including a code sample demonstrating decimals?

Reply ↓
Joker_vD on February 21, 2014 at 9:10 am said:

Man, the fact that dividing an integer by an integer yields again an integer in C-looking languages is really annoying. However, some languages parted from this custom: Ruby, for example.

Though I really couldn’t help laughing when I was reading the Ruby book and this behaviour (5 / 10 results in 0.5) was presented as rather innovative—Pascal had it this way pretty much for as long as C itself exists.

Reply ↓
John Payson on February 21, 2014 at 11:42 am said:

What difficulties would there be in allowing integer-type variables to be declared as “numbers (yield numerically correct answers or throw exceptions)”, “wrapping algebraic rings (define wrapping behavior for out-of-bounds math)”, “do what’s fastest” (assume values won’t go out of bounds, but don’t bother checking), or “legacy behavior”? It seems that many of the difficulties with signed vs. unsigned math in C stem from the fact that C mostly treats unsigned types as members of algebraic rings, but sometimes treats them as numbers. Making clear when things are expected to behave as rings and when they’re expected to behave as numbers would make it easier for compilers to allow things which will behave as expected, while squawking at things which might not. Even though the .NET framework only has one `UInt32` type, that doesn’t mean a compiler couldn’t keep track of which variables are declared which ways (attaching attributes to method parameters, fields, etc.)

Similarly, how hard would it for a compiler to distinguish between “thing I want to behave strictly as an IEEE 32-bit float”, “thing which would ideally be perfectly precise, but I need it to fit in four bytes”. Both types (and the existing `float`) would be seen by .NET as `Single`, but they would vary in the allowed conversions and promotion behavior. Assuming the new types were “ieee float’, “short double” [context-sensitive keywords]

ieee float f1 = 0.1; // Compilation error; no implicit conversions for IEEE float
double d1 = f1; // Compilation error; no implicit conversions for IEEE float
short double f2 = 1.4142; // Implicit cast from double ok
short double f3 = f2*f2-2; // Cast f2 to 64-bit float before the multiply; cast result back to 32 bit

There are times when “float” is specified to employ the exact behaviors of IEEE 32-bit math; in such cases, any implicit cast is a sign of a mistake. There are other times when “float” is simply used as a means of saving space; allowing code to use “short double” in the latter situation would not make assignments like f3 above to be more concise and legible than

float f4 = (float)((double)f2*f2-2);

but would also reduce the likelihood of mistakes (e.g. forgetting to cast f2 to “double” before the multiply to avoid catastrophic calculation in the subtraction).

Reply ↓
- Eric Lippert on February 21, 2014 at 11:55 am said:
  
  I think it would be great if the intention of the type was clearly expressed by the name. We sensibly got away from this idea that you can just use zero and not-zero int as a bool. We can sensibly get away from these old-fashioned ideas that integer numbers are also a collection of bits, and so on.
  
  Frankly, most of the time it should be either (1) the compiler figuring out what the right type for a thing is, or (2) the language removing the need for the type to be expressed in the first place. Why should anyone be writing “for(int i = 0; i < arr.Length; ++i)" in 2014? Raise the level of abstraction; if there is a need to range over the indices and elements of an array then *build that construct into the language* and have the language generate code that assigns the most efficient type to the index.
  
  Reply ↓
  - Joshua on February 23, 2014 at 2:15 pm said:
    
    The best of my very small contributions to the art of obfuscated programming have relied on C++’s attitude that everything is an int at the end of the day.
    
    I once wrote a tree implementation of which I was quite proud that used all sorts of things as indices into an array.
    
    Reply ↓
  - John Payson on February 24, 2014 at 12:08 pm said:
    
    I think we both see the same problem: because it is *sometimes* (though rarely) useful to use numerical operations which are defined on collections of bits (instead of numbers), language designers often feel compelled to define all numerical operations in terms of bitwise behavior. The cleanest way I can see to get around this problem would be to have separate types for “collections of bits” [whose arithmetic operations would behave as wrapping algebraic rings] and numbers; how easy or hard would it be for a compiler to define e.g. `Num32`, `UNum32`, `Num64`, along with `Ring8`, `Ring16`, `Ring32`, and `Ring64` types, with widening conversions from any numeric type to any ring, but no implicit conversions from any ring to anything else; rings would have `AsUnigned` and `AsSigned` properties which would yield either the smallest nonnegative number, or the smallest-magnitude number, which was equivalent to a ring value. That would allow code to e.g. use wrapping arithmetic to compute a hash value without having to be in an unchecked context. Would such an implementation be easy or difficult?
    
    Reply ↓
- voo on February 21, 2014 at 2:16 pm said:
  
  I find the idea of having limited precision integers in a high-level language in 2014 a bad idea to begin with. Well not really (they do have their uses, IO, math heavy code,…), but I think the language should be set up in such a way that you’d use the unlimited precision integers by default and had to make an extra effort to use the n_bit signed/unsigned integers for when you really need them.
  
  Is there an overhead involved in BigIntegers? Sure, but in the common case it’s a very predictable branch against an overflow flag (which any ISA I can think of right now sets for its arithmetic operations anyhow). For 98% of all written code in C# I seriously doubt you’d notice the difference.
  
  Now how to solve the problem for floating points I have no idea – every implementation causes some kind of kinks you have to be aware of, you basically just exchange what you have to deal with.
  
  I definitely agree about your point on C arithmetic though – disallowing implicit conversions from signed unsigned (and specifying two’s complement behavior for signed numbers) would have eliminated I think the vast majority of bugs in this area.
  
  Reply ↓
  - John Payson on February 28, 2014 at 9:06 am said:
    
    In many situations, it’s possible to guarantee that certain values won’t exceed 2^31 or 2^63 unless either there are some major breakthroughs in computer technology, or something is seriously wrong. I find absurd the idea of using general-purpose number types that wrap silently in case of overflow (languages should include types that wrap, since they are very useful for some purposes, and might allow unchecked calculations in places where speed is critical), but there’s nothing wrong with using integers that will throw an exception when a certain limit is reached. If, for example, one has an iterative calculation which is supposed to alternatively multiply and divide a number by values which should keep it below 9 quintillion, but because of unexpected input the number would start growing without bounds, it would be better for the program to throw an exception when the limit of a 64-bit integer is reached than for it to start gobbling up memory and CPU time with computations on million-digit numbers.
    
    Reply ↓
Eugene on February 21, 2014 at 8:36 pm said:

> With our first case, the error in “f” is small enough that the runtime considers 9.2 and the value stored in “f” to be the same number.
It’s not that (there’s no built-in tolerance in FP math), it’s the fact that the right-hand side of comparison also had an error. It was 9.2, which is not representable exactly, so we had 9.2 + error on the right, and because that’s actually the same number as on the left, the errors matched exactly. Now, in the second case we had error1*100 on the left and error2 on the right, so they didn’t match. In fact, error2 was 0 because 920.0 is an integer. For other numbers it could be non-zero and still not the same as error1*100, e.g. 1.11 * 10 != 11.1.

Reply ↓
Joshua on February 23, 2014 at 3:02 pm said:

Your article touches on one common floating-point error that I think could be brought out into its own bullet point at the end, instead of subsumed into general “floating-point representational errors”:

If you ever compare two floating-point values with the == operator, you are walking on very thin ice. You need to know they will be bit-for-bit identical, not just the results of two calculations that you can prove are equal in proper maths. Often, you should replace a == b with your language’s equivalent of abs(a-b)<DELTA, where DELTA is some value that is several orders of magnitude smaller than a or b.

Reply ↓
- John Payson on February 25, 2014 at 4:20 pm said:
  
  Worse than that, there’s no guarantee that a floating-point variable will compare equal to *itself*.
  
  Reply ↓
Ale Miralles on February 28, 2014 at 7:12 am said:

+1 to “Raise the level of abstraction”
If I just wanna loop over an array, what’s better way to do it, non of my business my friend. Give me a higher level construct and let the compiler to figure it out.
(That is a cool thing about dynamic languages).

Reply ↓
Bradley on February 28, 2014 at 8:53 am said:

Eric, why does the order of multiplication matter?

double c = 10.0;
double e = 92.0;
double f = e / c;
double h = c * c * f; // 919.99999999999989
double i = f * c * c; // 920.0
double j = c * f * c; // 920.0

Reply ↓
- Eric Lippert on February 28, 2014 at 10:45 am said:
  
  First off, the C# compiler and the CLR can always choose to do computations in higher precision than is mandated for any reason whatsoever. So sometimes you will get these discrepancies even on computations done in the same order.
  
  Let’s assume that none of those things are happening here.
  
  Multiplications go left-to-right. In the first case we are multiplying exactly 100 by (9.2 + error1) and ending up with 920 + error2, where error2 is up to 100 times larger than error1.
  
  In the second and third cases we are first multiplying exactly 10 by (9.2 + error1), and that produces 92 + error3, where error3 is up to 10 times larger than error1. But it could be smaller! By sheer chance it could be the case that this computation reduces the error to some very small amount. Now when we multiply that thing by ten we end up with 920 + error4, where error4 is up to 10 times larger than error3.
  
  So, still with me? error4 and error2 are both up to 100 times larger than error1, but they need not be the same. error4 could be smaller than error2 if it just so happens that we got lucky and error3 is very small.
  
  Reply ↓
David on May 28, 2014 at 5:12 pm said:

Hi Eric, just wondering why you didn’t mention anything about the 2s complement under the “Integer Addition” section?

Reply ↓
sonergonul on February 22, 2020 at 12:19 am said:

The link seems dead. You can read the article from: https://www.itprotoday.com/microsoft-visual-studio/c-code-writing-tips-eric-lippert-math-hard-let-s-go-shopping

Reply ↓