Math from scratch, part two: zero and one

Posted on September 19, 2013 by ericlippert

Last time I said that I was going to choose the “sequence of bits” technique for representing a natural number. At this point I don’t think I need to go into detail about how bits work; you all know that each position represents the addition of a power of two multiplied by the value of the bit at that position, blah blah blah. Let’s instead concentrate on the technical details. Continue reading →

Math from scratch, part one

Posted on September 16, 2013 by ericlippert

Computers are of course called “computers” because they’re good at math; there are lots of ways to do math in C#. We’ve got signed and unsigned integers and binary and decimal floating point types built into the language. And it is easy to use types in libraries such as BigInteger and BigRational[1. Which can be found in the Microsoft Solver Foundation library.] to represent arbitrary-sized integers and arbitrary-precision rationals.

From a practical perspective, for the vast majority of programmers who are not developing those libraries, this is a solved problem; you can use these tools without really caring very much how they work. Sure, there are lots of pitfalls with binary floats that we’ve talked about many times in this blog over the years, but these are pretty well understood.

So it’s a little bit quixotic of me to start this series, in which I’m going to set myself the challenge of implementing arbitrary-size integer[2. Maybe we’ll go further into rationals as well; we’ll see how long this takes!] arithmetic without using any built-in types other than object and bool.[3. As we’ll see, we’ll be forced to use int and string in one or two places, but they will all be away from the main line of the code. I will also be using the standard exception classes to indicate error conditions such as dividing by zero.]

Continue reading →

Why not allow double/decimal implicit conversions?

Posted on July 18, 2013 by ericlippert

I’ve talked a lot about floating point math over the years in this blog, but a quick refresher is in order for this episode.

A double represents a number of the form +/- (1 + F / 2⁵² ) x 2^E-1023, where F is a 52 bit unsigned integer and E is an 11 bit unsigned integer; that makes 63 bits and the remaining bit is the sign, zero for positive, one for negative. You’ll note that there is no way to represent zero in this format, so by convention if F and E are both zero, the value is zero. (And similarly there are other reserved bit patterns for infinities, NaN and denormalized floats which we will not get into today.)

A decimal represents a number in the form +/- V / 10^X where V is a 96 bit unsigned integer and X is an integer between 0 and 28.

Both are of course “floating point” because the number of bits of precision in each case is fixed, but the position of the decimal point can effectively vary as the exponent changes.
Continue reading →

An integer division identity

Posted on June 4, 2013 by ericlippert

Here’s an interesting question that came up on StackOverflow the other day: we know in “real” algebra that the equation (a / b) / c = a / (b * c) is an “identity”; it is true for any values of a, b, and c.[1. Assuming of course that both sides have a well-defined value.] Does this identity hold for integer arithmetic in C#?

Continue reading →

Integer division that rounds up

Posted on January 28, 2013 by ericlippert

Integer arithmetic is surprisingly difficult to get right. There are usually lots of weird corner cases and the temptation to use too-clever bit-twiddling tricks is high. Today I thought I’d give an example of how I approach solving integer arithmetic problems on those rare occasions when I have to. So let’s consider the problem of writing a C# method that divides two 32 bit integers such that the division rounds up if it is inexact. Continue reading →

Precedence vs order redux

Posted on August 10, 2009 by ericlippert

Once more I’m revisting the myth that order of evaluation has any relationship to operator precedence in C#. Here’s a version of this myth that I hear every now and then. Suppose you’ve got a field arr that is an array of ints, and some local variables index and value: Continue reading →

Long division

Posted on January 28, 2009 by ericlippert

I was recently asked by a reader why in C#, int divided by long has a result of long, even though it is clear that when an int is divided by a (nonzero) long, the result always fits into an int.

I agree that this is a bit of a head scratcher. After scratching my head for a while, two reasons to not have the proposed behaviour came to mind.

First, why is it even desirable to have the result fit into an int? You’d be saving merely four bytes of memory and probably cheap stack memory at that. You’re already doing the math in longs anyway; it would probably be more expensive in time to truncate the result down to int. Let’s not have a false economy here. Bytes are cheap.

A second and more important reason is illustrated by this case:

long x = whatever;
x = 2000000000 + 2000000000 / x;

Suppose x is one. Should this be two integers, each equal to two billion, added together with unchecked integer arithmetic, resulting in a negative number which is then converted to a long? Or should this be the long 4000000000 ?

Once you start doing a calculation in longs, odds are good that it is because you want the range of a long in the result and in all the calculations that get you to the result.

Fabulous adventures in coding

Eric Lippert's blog

Category Archives: Integer arithmetic