Math from scratch, part four: natural multiplication

Last time in this series proper we implemented integer addition recursively; today we’ll jump right in and go to multiplication. As before, null is illegal, so we’ll start with a public surface operator:

public static Natural operator *(Natural x, Natural y)
{
    if (ReferenceEquals(x, null))
        throw new ArgumentNullException("x");
    else if (ReferenceEquals(y, null))
        throw new ArgumentNullException("y");
    else
        return Multiply(x, y);
}

So let’s talk about multiplication. Again, we’ll do it recursively. The base cases are that Zero multiplied by anything is Zero. It is not necessary to also say that One multiplied by anything is that thing but it is a nice optimization because reference comparison is cheap, so we’ll do those base cases as well, unnecessarily. If we have two non-zero, non-one operands x and y then there are a lot of different ways to break it down into cases. Here’s one where we break it down into three cases: [1. It is a good exercise to come up with other ways to break this down into cases.]

First, if the least significant bit of x is 0 then:

  x * y
= { xtail : 0 } * y
= ( 2 * xtail + 0 ) * y
= 2 * xtail * y
= { xtail * y : 0 }

Second, if the least significant bit of y is 0 then:

  x * y
= x * { ytail : 0 }
= x * ( 2 * ytail + 0 )
= 2 * x * ytail
= { x * ytail : 0 }

Third, if the least significant bit of both is 1 then:

  x * y
= x * { ytail : 1 }
= x * ( 2 * ytail + 1 )
= x * 2 * ytail + x
= 2 * x * ytail + x
= { x * ytail : 0 } + x

In every case we recurse on a simpler case; eventually one of the operands will be Zero and the recursion will bottom out.

We can easily write code from the three cases above:

private static Natural Multiply(Natural x, Natural y)
{
    if (ReferenceEquals(x, Zero))
        return Zero;
    else if (ReferenceEquals(y, Zero))
        return Zero;
    else if (ReferenceEquals(x, One))
        return y;
    else if (ReferenceEquals(y, One))
        return x;
    else if (x.head == ZeroBit)
        return Create(Multiply(x.tail, y), ZeroBit);
    else if (y.head == ZeroBit)
        return Create(Multiply(x, y.tail), ZeroBit);
    else
        return Add(Create(Multiply(x, y.tail), ZeroBit), x);
}

And we’re done:

var n1 = Natural.One;
Console.WriteLine((n1 + n1) * (n1 + n1 + n1) * (n1 + n1 + n1 + n1 + n1));

produces

011110

as you would expect. Multiplication is pretty easy.

Now that we have multiplication I can’t resist throwing in this extra bonus operator because the recursive step is so nice:

xy
= x(2 * ytail + yhead)
= xytail * xytail * xyhead

The code is straightforward:

public Natural Power(Natural exponent)
{
    if (ReferenceEquals(exponent, null))
        throw new ArgumentNullException("exponent");
    else
        return Power(this, exponent);
}

private static Natural Power(Natural x, Natural y)
{
    if (ReferenceEquals(y, Zero))
        return One;
    else
    {
        Natural p = Power(x, y.tail);
        Natural result = Multiply(p, p);
        if (y.head == OneBit)
            result = Multiply(result, x);
        return result;
    }
}

Notice how much more efficient this is than simply looping to do y multiplications; we’re instead doing only O(log y) multiplications. This technique can allow us to compute very large numbers very quickly.[2. Always keeping in mind of course that our non-tail-recursive addition and multiplication algorithms will blow the stack after a few hundred bits. Again, this is not intended to be a real-world large number library.]

Next time on FAIC: Subtraction is slightly harder than addition and multiplication, but not much.

24 thoughts on “Math from scratch, part four: natural multiplication

  1. Eric,

    I know you are designing this with Natural numbers in mind, but consider that it might be nice to be able to use this in other Z groups. In those cases the Identity (One) of that group/ring might not be the natural number One.

    Thanks,
    David

  2. It’s interesting to see that while binary decomposition is a nice improvement over the naive version, it’s not necessarily optimal: x^15 takes 6 multiplications using binary composition but using (x^3)^5 we can also do it in 5.

    As far as I know the only known methods to find the optimal number of multiplications involve an extensive search. Always fascinating when something as “simple” as a power function turns out to not be as simple as thought.

    • To clarify, voo’s point is that my technique computes

      x2 = x * x
      x3 = x2 * x
      x6 = x3 * x3
      x7 = x6 * x
      x14 = x7 * x7
      x15 = x14 * x
      

      in six steps, but you can do it in five:

      x2 = x * x
      x3 = x2 * x
      x6 = x3 * x3
      x12 = x6 * x6
      x15 = x12 * x3
      

      Since my technique does a maximum of two multiplications per step, and each step decreases the number of bits in the exponent by one, and the number of bits is proportional to the log of its magnitude, we can achieve O(log y) multiplications. (Of course, I am not using a very efficient multiplication algorithm here either.) Whether there is an asymptotically better algorithm, I don’t know one way or the other.

      • I don’t see how there could be an asymptotically better algorithm, given that the algorithm is optimal for any exponent which is a power of two, there’s no way to raise a number to a larger exponent without requiring more multiplications, and the algorithm never requires more than twice as many multiplications as would be required to raise to the next larger power of two. That would imply that the number of multiplies required for this algorithm will always be within a constant factor of the number required for any “better” one.

        If multiplies are assumed to have a cost which varies with operand size, then the overall cost of the given algorithm will not be O(log[n]), with n being the exponent, and use of other algorithms might achieve a better-than-constant speedup, but that would depend a lot upon how the multiplies themselves were performed.

  3. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1453

  4. I noticed – there are eight ways of doing your last case, based on switching the operands to Add, Multiply, and switching which number you take the tail of. Of the ones I tested, yours had the best performance (fewest total Natural instances created) – was there some way of determining this, or was it just lucky?

    • The Peano axioms that I’ve implemented so far are that zero is a natural, naturals have successors and that zero is not a successor of any natural. Clearly all three of those are true. Most of the rest of the axioms concern equality, which I have not implemented yet.

  5. It’s times like this that make you with the .Net team did the work to detect stack overflow and raise an exception. It would be so cool if you could just raise an overflow exception when your numbers got too big instead of just crashing the whole process.

    • I understand the technical reasons why it would be impractical not to have a stack overflow exception kill the thread upon which it occurs. What I would have like to see, however, would have been either a means by which recursive code could detect that they were getting dangerously close to a stack overflow (e.g. if stack usage is greater than 90%, throw an `AvoidedStackOverflow` exception) , or else have a means by which a thread could set up a “stack overflow” delegate; if a stack overflow occurs on a thread where such a delegate is configured, kill just that thread (not the whole program) and restart the thread with that delegate [which could then notify other parts of the system as appropriate].

      The former approach would be somewhat crude and would increase by about 10% the amount of storage required to hold a stack of a given “usable” size, but would allow something like a recursive-descent parser to be somewhat guarded against an input containing a million left parentheses without it having to artificially restrict nesting depth. The latter approach would require Framework changes, and could only be safely applied to threads which could either tolerate having their “finally” blocks not run, or else attached the necessary cleanup code to the “stack overflow” delegate, but could still be more elegant than having to kill off an entire appdomain.

      • I wrote the stack overflow detection code for the original VBScript and JScript engines and this was the approach we took.

        The way stacks work in Windows is as follows. (This is a simplification; the actual guarding strategy is a bit more complicated, and the way it is handled in .NET is even more complicated.) First you reserve a block of 1 MB divided up into pages. You mark all the pages except for the first ones as invalid, and you end the valid region with a guard page. When the guard page is touched under normal circumstances it issues a fault which is handled by making the guard page into a valid page, and marking the next page as the new guard page. Thus the stack grows as you use more of it. When the guard page gets to the end of the 1 MB block it changes its behavior. Hitting it for the first time issues an out-of-stack fault that can be handled. Hitting it a second time terminates the process immediately.

        So what we did in VBScript/JScript was periodically compared the address of the current stack pointer to the known end of the 1MB block. If it was getting close — within, say 16 KB — then we would pre-emptively create an out-of-stack error in VBScript, so that whatever code handled that error would have enough stack left to run some code. The script engines actually had protocols in them to decide when there was enough stack unwound that it could safely report the error to the script host; as IE’s error dialogue took up more and more stack over time we had to keep bumping the limit upwards.

        It would be great if the CLR had a similar mechanism; it makes it very difficult to write a managed recursive descent parser. One of the last things I did before leaving Microsoft was convert a bunch of the parser and analysis code from entirely recursive to partially or entirely iterative algorithms, so as to decrease the probability of an fatal out-of-stack error. In many cases what I did was recurse only on the right side of a binary tree, if it was likely that the tree was heavy to the left, or vice-versa. “res = a + b + c + d + e + …” is much more likely than “res = a + (b + (c + (d + ( … ))))”

        • I recognize that handling a stack overflow cleanly without clobbering the thread which causes it is hard. I would think the first approach I described would be pretty cheap easy, however. Adding stack-check code automatically at the compiler level would in theory be cheap, but making it work usefully could be a little tricky, since a `finally` block which runs as a consequence of a stack overflow might require somewhat more stack than the original method call which overflowed. When using manual checking, such problems could be avoided by simply refraining from doing the manual checks at times where one didn’t want an exception, and by avoiding making deeply-nested calls without checks (or at times when one didn’t want an exception).

        • Handling stack overflow/out of memory exceptions is tricky, and I personally doubt that it is worth it—because usually it means you are using an algorithm which is quite ineffective for the amount of data you’re trying to process. Most of the time stack overflows happen because of unbounded recursion—turn your “call trace” into a data structure, because the heap is much bigger than 1 MB.

          Got out of memory problem? Stop trying to hold all the data at once in memory: one doesn’t need a 16 GB buffer to receive a 16 GB file over the network and save it to the disk.

          And if you really, really need that much of memory, it probably means you’re trying to solve a PSPACE-complete problem or even someting worse. So estimate some realistic bounds and stick to them.

          • Resources such as memory and–to a lesser extent stack space–exist to be used. Even if one could write a program which could process a 10MB file in 1024-byte chunks while using less then 64K of memory total and only have it take twice as long as reading everything into memory, processing it, and writing it out, such a design would only be good if one foresaw a need to process files bigger than available memory. Being able to have a file-load routine refuse to load a file which would use almost all available memory (however much memory that happened to be) would seem much cleaner than having to pick a limit which most likely be needlessly low in some circumstances (e.g. refusing to open a 500MB file on a system with 64 gigs available) or too high in others (e.g. being willing to open a 499.98MB file on a system with 499.99MB available).

            I’m not sure what the best paradigm is for systems to let programs know when recursive resource utilization is high enough that resource-consuming operations which can be reasonably refused, should be, but trying to minimize the usage of certain resources even when they’re plentiful doesn’t seem like the best approach (especially if it may increase the usage of other resources like disk bandwidth)

  6. In your static Power method I think you can also add the following checks after checking whether ReferenceEquals(y,Zero):

    if (ReferenceEquals(x, Zero) || ReferenceEquals(x, One))
    {
    return x;
    }

  7. Great series! It is very cool to see the theoretic “cases” approach leading directly to very straight-forward implementations.

    After reading your initial post I decided to play around with the Set-Theoretic approach which led to the following implementation: https://github.com/RaphMad/MathFromScratch

    While the concept itself is a bit mind-boggling, the actual implementations of +,*, ^ were astoundingly easy (for each operator 1 base case, 1 recursive case), but of course also way less effecient.

  8. Hi,

    Just curious to know if one could get a better asymptotic complexity of the multiplication algorithm ( uses subtraction) if one were to see it as follows:
    (xtail : xhead) * (ytail : yhead) and computing 3 multiplications (how?) and 2 subtractions?

    Thanks,
    Shankar

Leave a comment