I’ve talked a lot about floating point math over the years in this blog, but a quick refresher is in order for this episode.

A `double `

represents a number of the form ` +/- (1 + F / 2`

, where ^{52} ) x 2^{E-1023}`F`

is a 52 bit unsigned integer and `E`

is an 11 bit unsigned integer; that makes 63 bits and the remaining bit is the sign, zero for positive, one for negative. You’ll note that there is no way to represent zero in this format, so by convention if `F`

and `E`

are both zero, the value is zero. (And similarly there are other reserved bit patterns for infinities, NaN and denormalized floats which we will not get into today.)

A `decimal `

represents a number in the form ` +/- V / 10`

where ^{X}`V`

is a 96 bit unsigned integer and `X`

is an integer between 0 and 28.

Both are of course “floating point” because the number of bits of precision in each case is fixed, but the position of the decimal point can effectively vary as the exponent changes.

A few things to notice here: first, `double `

has enormously larger range. The largest possible `decimal `

is a paltry 7.9 x 10^{28}, whereas the largest possible `double `

is about 1.8 x 10^{308}, a ridiculously large number. And also because of that big exponent, the size of the smallest positive (non-zero) number that can be represented by a `double `

is far, far smaller than the smallest that can be represented by a `decimal`

; in that sense, `double `

has vastly larger range on the “small” end. Second, `decimal `

has an enormously larger precision in terms of the number of significant digits; `double `

has 52 bits of precision and `decimal `

has 96 bits. (In decimal digits, that’s the difference between 15 digits and 28 digits of precision.)

I am occasionally asked why it is that there is no implicit conversion either from `double `

to `decimal `

or from `decimal `

to `double`

. The easy answer is: there cannot be an implicit conversion from `double `

to `decimal `

because of the range discrepancy; a huge number of `double`

s are larger than the largest possible `decimal`

, and therefore an implicit conversion would either have to throw or silently lose perhaps an enormous quantity of magnitude, both of which are unacceptable. There could be an implicit conversion from `decimal `

to `double `

because that would only lose precision, not magnitude. C# already allows an implicit conversion from `long `

to `double`

, which can lose up to twelve bits of precision. The conversion from `decimal `

to `double `

would lose far more precision; going from 96 bit precision to 52 bit precision seems like too large a drop to make this implicit.

That’s the easy answer, and that alone would be sufficient. The somewhat less obvious reason to require conversions between `decimal `

and `double `

to be explicit is because that conversion is fundamentally a strange thing to do and you should think hard before you do it. `decimal `

is typically used to represent exact numeric quantities, particularly for financial computations that need to be accurate to a fraction of a penny. `double `

is typically used to represent physical quantities such as length, mass and speed, where the precision of the representation is higher than the precision of the measurement, and therefore tiny errors in representation matter less. These are two very different approaches to computation, and so it’s probably a bad idea to allow them to mix without making the code wave a big flag in the form of a cast, calling attention to what is going on.

I would think that the main reason for wanting an implicit conversion from double to decimal is so that you can say things like

decimal apr = 4.3; // needs to be 4.3m or (decimal)4.3

You’d want both conversions if you wanted to type this:

decimal fv = principal * Math.Pow(1 + apr / 12, n);

instead of

decimal fv = principal * (decimal)Math.Pow(1 + (double)apr / 12, n);

You just gave a reason why there *shouldn’t* be implicit conversions from Double to Decimal: the result of the conversion may not always be the best `Decimal` representation of the desired quantity. Although your particular example works out okay, such conversions are in general wrong.

Double PiDouble = 3.141592653589793238462643383279;

Decimal PiBad = (Decimal)3.141592653589793238462643383279;

Decimal PiGood = 3.141592653589793238462643383279m;

The three values will equal:

3.141592653589793116xxxxxxxxxxxx

3.14159265358979000000000000000

3.14159265358979323846264338330

None of them is exactly as specified, but the second one should be considered Just Plain Wrong. Not only is it far from being the best Decimal representation of the specified value–it’s not even as good as the best Double representation of that value!

One problem is that C# uses the wrong data type for constants like 3.141592653589793238462643383279. If the constant can be represented exactly as a decimal then the type of the constant should be decimal instead of double.

Type `Double` is used a lot more than `Decimal`. If one were to write `var a=0.0000000000000000138777878078145, b=13.1;`, what types should `a` and `b` be? Note that if one were to have the compiler internally store literals as `Decimal` but have their type infer as `Double`, rounding the value for `a` to the closest `Decimal` before assigning it to a `Double` would cause a significant loss of precision versus storing it directly to the `Double`.

I suspect that Double is used more than Decimal *incorrectly*, largely for legacy reasons and because if you don’t specify a suffix on a floating point literal, it’s implicitly a Double.

I’ve seen far more examples of Double being used when Decimal should be used than the reverse; if I were designing a language I’d either *force* a suffix for floating point literals or perhaps default to Decimal instead.

First: pi is exactly represented by that decimal? I don’t think so. Second, why is it ever necessary for practical purposes to have pi to more than, say, seven decimal places?

In a calculation about positions on earth?

Pi to seven digits after the decimal place is sufficient precision to do spherical trigonometry that is accurate to on the order of a meter on the surface of the earth.

Continental drift is tearing us apart, and you don’t even notice. 🙂

More than 7 decimal places for pi might be useful when error needs to accumulate as slowly as possible, maybe to avoid graphical glitches in 3D games.

I think it is an elegant design (I don’t remember which language does this) to not force any type onto a number literal, until the code actually uses it.

const var pi = 3.14159265358979323846264338327950288;

double d = pi; // convertion to nearest double

decimal dd = pi; // convertion to nearest decimal

I suppose that means APR calculations should not be performed with Math.Pow().

My question is: why is there no Math.Pow(decimal)?

That’s an interesting point that perhaps I should have addressed in the article. When you are making an APR calculation you are usually going to get a result that is not an exact fraction of a penny in either base two or base ten, and where the operands are unlikely to exceed the precision of a double; not very many people have mortgage balances that require fifteen significant digits when measured in dollars, though I suppose they might in other currencies.

This is a situation where it makes sense to convert a decimal to double, do a calculation in doubles, and convert the result back to decimal.

That said, I agree that there ought to be a Math.Pow(decimal).

I don’t think interest rate calculations are generally done using the indicated formula; I think they are instead done using discrete multiplications for each compounding period. Partial periods are handled by multiplying the interest that would be charged in a whole period by the length of the partial period and dividing by the length of the partial period. While using a power function may allow one to more quickly compute the value of compounded interest in the absence of intermediate rounding steps, many accounts round things to the nearest penny at certain discrete times, and if one wants everything to balance exactly, things like interest calculations must be performed with all intermediate rounding steps.

In doing calculations where rounding errors are going to be unavoidable, one needs to choose where one wants those errors to be focused. For example, if one is labeling percentages on a pie chart which has three equal subdivisions, one may favor reporting all three percentages as 33.3%, or one might favor labeling two as 33.3% and one as 33.4%. The latter style would be a little bit less precise (by 0.03334 percentage points) on one of the labels, but would leave 100.0% accounted for. The former style would be more precise on all the labels, but the total would be off by 0.1%. Using the power formula and type `double` to compute the interest in 30 years might give a closer approximation to the mathematical answer than would using a fixed-point type and rounding each year to the nearest penny, but the latter style of calculation would match what one would have if the account was in a bank that, in fact, did round each year’s balance to the nearest penny.

If widening conversions were allowed from Decimal to Double and from Double to Single, but not in the reverse direction in either case, then for any trio of types T,U,V (including integral types as well as the above) such that implicit casts exist from T to U and from U to V, conversion from T to U and then to V would yield the same result as conversion from T to V. Although converting something from e.g. Int64 to Double and from Double to Float may not always yield the best Float representation of the original Int64, a “direct” conversion will fail to do so in the same cases.

Were it not for the widening conversion from Single to Double, all of the implicit conversions would have the useful characteristic that any loss of information (rather than hiding of it) could only be demonstrated by use of a narrowing cast. Further, while some values which are numerically different would be regarded as indistinguishable, those which are recognized as distinct would be properly ranked.

Which should be larger–(float)16777217, or 16777216.000001? How about 1E20f*1E20f versus 1E240? I would posit that floating-point values are expected to often compare as indistinguishable even when the mathematical quantities they would represent aren’t the same, but rankings are generally expected to be correct unless the difference between the arguments is small relative to the precision of the type being compared. A difference of 59 parts per billion is not exactly small relative to the precision of a double, and a difference of 200 orders of magnitude is definitely not small.

If you treat types as objects in a category, and implicit conversions from A to B as morphisms from A to B in this category, then one OBVIOUS requirement is that, well, it must be a category! Which implies, among other things, that if there is an implicit conversion in both directions, then data loss is prohibited. Also, if you can implicitly convert x to type B in more than one way, they all must yield the same result.

Never really looked at decimal, but I’m wondering what was the initial ideas behind the design of a decimal? 128bit compare to a 64bit double could be largely enough to provide a larger precision both for the mantissa and exponent, while still being able to map a double. Also why the usage of a base 10 instead of a base 2?

The whole point of decimal is to be base ten; that’s why the type is called *decimal*. It is so that decimal fractions can be represented with 100% accuracy, as is often necessary in currency calculations. The precision and range were chosen to exceed reasonable requirements for precision and range of financial calculations.

I just want to add that decimal being decimal is actually a big deal. double works brilliant for physics where you deal with measurements (they are inexact) but in finance when a user writes $100.99 this is exact. However this can’t be exactly represented as double. Only fractions of .0, .25, .50 and .75 can be exactly represented with 2 decimals, all other cases introduces an unnecessary noise in the calculations. I worked for a vendor of SME software and some Product lines had in the 90s picked double to financial quantities. This choice cost a lot of money each year in terms of support and horrible workarounds. Interestingly, Kahan (a cool guy working with the x87 in the 70s) did suggest to intel to base floating points around base 10 instead of base 2 but unfortunately the design was too far along. Kahan did see further than most.

PS. The windows phone calculator has an interesting feature in that it allows fractional numbers to be converted to hex,oct and binary. If you have a WP type in 100,99 and convert to binary. You get an “infinite” series.

Doubles do represent integers exactly up to (I believe) 9,007,199,254,740,992) but with a national debt like ours that apparently isn’t enough.

Mårten was talking about fractions in this case. Double cannot represent 0.99 (99 cents). It can only approach it.

@Patrick,

Well, if your units are cents a double represents dollars and cents with 100% accuracy to an awfully big number

I doubt Kahan ever suggested base 10 instead of base 2, because base 10 is less efficient in precision-per-bit. Base 10 has greater wobble (wasted precision when the leading digit is small) and it loses the benefit of the implied one that IEEE floats and doubles have. Base 2 is ideal for most calculations, just not financial ones.

A double storing can still exactly hold the US national debt in pennies, but that is not as convenient as storing it as dollars with a fractional part for pennies.

See this post for more on wobble:

http://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2/

Historically it used to be extremely common to express fixed-point numbers in base 10, and base-10 floating-point formats where hardly unknown. Base-2 formats floating-point formats are better than base 10 floating-point everything but I/O (neither base-10 nor base-2 floating-point formats are really suitable for financial calculations) but decimal-format I/O is much easier with base-10 floating-point than with binary floating-point.

What factors motivated having `Decimal` be a floating-point type rather than simply having it be a 128-bit fixed-point type with e.g. 21 digits to the right of the decimal (maximum value about 170,000,000,000,000,000)? The maximum range wouldn’t have extended up quite as high as the floating-point `Decimal` type, but would have reached far enough for most financial purposes, and such a type would have had more precision than the floating-point `Decimal` for numbers whose integral part is greater than 10,000,000.

That’s a good question that I do not know the answer to. That decision was made before my time. I note that in VB6 there was a 64 bit fixed-point Currency type, which was fixed to four places to the right of the decimal. This type was never ported to .NET.

My guess is that Decimal was added for better support of ANSI SQL type NUMERIC(p, s).

128 bit fixed point would be really cool and very useful. In particular it would be almost as fast as using a double (decimals are significantly slower). Unfortunately .NET does not provide even a 128 bit integer which could be the foundation of such a type.

Where do you propose the decimal point would be in your hypothetical 128-bit fixed-point format?

I’m actually in favour of a language that forbids any implicit conversion that would lose data. I find allowing implicit conversion from a 64-bit integer to a 64-bit floating point is a bad thing. Obviously then it follows that converting an even larger decimal shouldn’t convert. In my language, Leaf, I don’t allow any such lossly implicit conversions.

I’m kind of curious about C# however, as you described ‘decimal’ is also a floating point type, just in base 10 instead of base 2. So simply adding two decimals together can result in precision and range loss. I guess I’m not clear on the purpose of decimal, as opposed to just using a larger floating point type (say 128bit).

What is meant by “information loss”? It is expected and normal for computers to throw out boatloads of information all the time; it’s only a problem if information which was thrown out turns out to be necessary. If some measurements are taken as type `double` and at some later time will be output using a format string of “0.000”, and if nothing else will ever be done with the measurements, converting the data to type float (i.e. Single) won’t lose any information *which isn’t destined to be discarded anyhow*.

As for `Decimal`, it offers none of the guarantees associated with fixed-point types. Its designers may have figured that its floating-point aspects would make it useful by any code which needed any fixed point type with less than 28 digits, but unfortunately that doesn’t quite work. Imagine the following sequence of steps performed with all variables being Decimal, all being a fixed 9+6, and all being fixed 4+3

one = 1;

three = 3;

almostTenThousand = 9999.999;

oneThird = one / three;

bigNumber= oneThird + almostTenThousand ;

oneThirdAgain = bigNumber- almostTenThousand ;

shouldBeZero = oneThirdAgain – oneThird;

The computation of oneThird will necessary require some form of rounding. With type Decimal, it will be rounded to 27 places. With 9+6, it will be rounded to six places, and with 4+3 to three places. So the fixed-point formats lose more precision at that step. On the other hand, on all the steps involving addition and subtraction, fixed-point formats will either perform the result exactly (e.g. with 9.6) or throw an exception (e.g. with 4.3), so shouldBeZero will be zero. By contrast, just about any floating-point format, even if it has vastly more precision than 8-byte 9.6 fixed point BCD, will end up with a non-zero residue in shouldBeZero.

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1402

While we’re all talking about different ways of storing real numbers, I wonder if there was any thought to having a rational type, stored as two integers.

rational x = 10.99R; // x.N=1099 , x.D=1000

rational y = 1R/3R; // Exactly one third.

rational z = y*42; // Exact number of thirds.

While decimal is good for base 10, rational would be good for all(!) bases.

Eric, you don’t going back in time and doing all this work while you were at MS do you? Thanks. I’ll expect it on my desk by Monday. 🙂

OK you got it. There is an arbitrary-precision-rational library in Microsoft Solver Foundation, which is a free download. I didn’t write it but in an interesting turn of events, I seem to recall that the code in there was written by yet another one of the former C# language designers and compiler implementers.

Get it while it’s available, too! Sadly, MSF looks like another cool bit of technology to come out of MSFT R&D that’s dying on the vine – nothing on the blog in over two years and virtually nothing on the forum.

IronScheme (as required by the R6RS spec) also supports ‘arbitrary-precision-rationals’.

Rational would be good for all bases, but if its based on integers, it wouldn’t have much numerical range.

I would assume that one reason for not implanting it in .NET it is that outside the realm of pure maths, there wouldn’t be much call for it. Science rarely deals with rational numbers. There’d probably be some use for it in finance but presumably not huge, since as soon as any calculation gives an irrational number, rational is useless. On the other hand, in many cases you can store prices 100% accurately with integers, by just making your unit 1 cent, or 0.1 cents instead of $1. That doesn’t leave a massive area between those domains where a Rational would be particularly useful!

Oooops, that should’ve been a reply to Bill P.Godfrey’s comment, I accidentally posted it as an independent comment.

Pingback: BigDecimal over double | Tech Geek

Curiously C# allows implicit conversion from long to float as well. Given that .Net uses IEEE floating point standard and hence a float can only hold a maximum long of 2^24 without losing its value while long itself can hold 2^64 values (singed and unsigned). Hence Making long to float conversion implicit makes it silently lose value for more than half of the long range of values (even though float is ‘roughly’ able to represent such a long i.e. with loss of precision). You mention about Long to double implicit conversion but what could possibly be the rationale for long to float. If we apply decimal to double logic to it, it seems it should not have been an implicit conversion.

That’s not true. The largest representable 64 bit unsigned long is on the order of 10^19. The largest representable single precision (32 bit) float is on the order of 10^38. Floats cover a much larger range.

I am sorry I was not explicit there. I agree that float has much larger range and that is why I say they are able to ‘roughly’ denote any long. When I said that Long to float conversion can silently lose value, I meant that any long greater than 2^24 (i.e. more than half of its range) can not be represented ‘exactly’ in a float and the difference between what the float holds and what long value actually was can be significantly large. For e.g. if you try

long lval = long.MaxValue;

float fval = lval;

You will see that difference is 36854775807 which is not a small number.

It’s desirable for implicit numeric conversions to obey certain axioms; unfortunately, as a whole, the set of conversions supported in .NET fails to abide by what I would consider four of the most important. I discuss them all at http://supercatnet.blogspot.com/2013/09/axioms-of-implicit-type-conversion.html but he axiom I consider most important would be that for any abstract numerical value x for which some type T has a “good” (if not perfect) representation t, and for any type U to which T is implicitly convertible, (U)t should be a good representation x in U (either the value closest to x, or one of two essentially-equally-good values). Conversion of any numeric type to `float` abides by that axiom just fine; conversion of `float` to `double` does not.

The only place where I would consider implicit conversion from one numeric type to a “fuzzier” type to be problematic are those where the type to which something is being converted is not obvious, such as when the result of a fuzzy conversion would be passed as the nth argument to a method whose nth parameter may sometimes require and sometimes not require the conversion, based upon other parameter types. If a programmer says `float f=someLong;` the statement will behave in the only fashion the programmer could plausibly intend: `f` will be made to represent the value of someLong as accurately–and only as accurately–as a `float` can. By contrast, if a programmer says `if (someLong == someFloat)` it’s unclear whether the programmer is intending to use the (float,float) override, the (long,long) override, the (double, double) override, or a non-existent (long, float) override which would return true only if the nominal value of `someFloat` is a whole number that will fit in a `long` and equals `someLong`. I’d consider the actual behavior of the “if” statement is astonishing, but not because long-to-float conversions are lossy, but rather because it’s far from obvious that such a conversion should take place. The best way to avoid such astonishment would not be to totally disallow conversions which obey fundamental axiom of implicit type conversion (above) in favor of conversions which don’t, but rather to require that fuzzy conversions only be used in cases where their use is obvious (such as when assigning to a variable of the conversion’s destination type).

Incidentally, it would be nice for a .NET languages to allow parts of a program to request specific implicit-conversion rules with primitive types. It’s important to allow code to use the same rules it always has, but being able to request rules that were in some cases more restrictive (e.g. disallow use of `==` operator in many mixed-type scenarios) but in other cases less restrictive (e.g. allowing code which performs computations on `double` to pass the results directly to graphics routines that accept `float`) could make future code easier to write and maintain. I don’t think that should cause any difficulties when mixing modules written with different styles, since implicit conversions are all resolved at compile time.