BigNum arithmetic in Java — Let's outperform BigInteger!

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

Long story short, I've created a class called BigInt purely written in Java, which is pretty neat and outperforms BigInteger. There are most likely (many) bugs, but since it's pretty basic at the moment, it'll be a great opportunity for the Codeforces community to learn how to do arbitrary-precision arithmetic. ^^

Introduction

The standard library BigInteger class in Java is known to be notoriously slow (at least for Java 7 and earlier, and I doubt Java 8 is any different). There are faster alternatives like Gnu Classpath providing support for GMP (which is optimized beyond infinity), but this is probably not a viable option for competitive programming. When it comes to pure Java we also have the Apint class provided by the Apfloat library and the LargeInteger class provided by the JScience library, but these two are probably not only clunky for competitive programming usage, they are also not that much faster than BigInteger (in many cases they're slower, see section comparison below).

As a matter of fact I once had a conversation regarding the Quadratic Sieve with Torbjörn Granlund (the principal author of GMP) where we touched upon the issues of implementing QS in Java since BigInteger is so slow. Upon this he said something like "Yeah BigInteger is really shitty. I can't understand why no one makes something better. I mean it would only take about 4 to 5 hours". This might be true for Torbjörn, but most likely not for the rest of us. However, the thought of creating my own kick-ass BigInt class stayed with me since that day, and now I've finally (kind of) done it.

In my experience, in competitive programming or problem solving like Project Euler, we come in contact with big numbers in the form of cumulative results caused by many operations with operands of small magnitude (e.g. 1337 factorial). So my first design choice with this in mind, and in direct contrast to the BigInteger class, was to make my BigInt class mutable.

Example

Let's have a look at how to calculate 1337! using BigInteger and BigInt respectively.

BigInteger fac = BigInteger.ONE;
for(int i = 1; i<=1337; i++) fac = fac.multiply(BigInteger.valueOf(i));

BigInt myFac = new BigInt(1);
for(int i = 1; i<=1337; i++) myFac.umul(i); //umul is unsigned multiplication

Not only is the ways of BigInt less verbose, it's also way faster. Since BigInteger is immutable it must allocate a new array for the results, convert the int to a BigInteger object, perform the multiplication (using a general algorithm), create a new object and return it. The overhead is massive and unnecessary. BigInt on the other hand just performs the operation directly on the internal array used to store the digits (in base 2³²) and grows the array (similar to ArrayList) should it run out of capacity. (There are better algorithms for calculating big factorials, but that's not the point in this example...)

So BigInteger is not suitable for many small operations. Is it suitable for operations using larger operands? As it turns out, no it's not.

Comparison

The following time measurements are a results of some benchmarks run on my shitty 1.65GHz Dual Core computer. ^^

Parsing: Converting two strings of length 500000 representing numbers in base 10 to the internal representation.
Add: Adding a 100000 decimal digit number to another equally sized number, 100000 times. (MidBig-size cumulative benchmark.)
Sub: Same as the addition experiment, but with subtractions.
MidMul: Multiplying a 300 decimal digit number to a growing product, initially a 300 decimal digit number. (Mid-size cumulative benchmark.)
TinyMul: The naive straightforward way to calculate 50000! (Small-size cumulative benchmark.)
BigMul: Multiplication of two 500000 decimal digit numbers. (Big-size benchmark.)
MidDiv: Dividing a 400000 decimal digit number by a 4000 decimal digit number a 1000 times. (Mid-size cumulative benchmark.)
BigDiv: Dividing a 400000 decimal digit number by a 200000 decimal digit number. (Big-size benchmark.)
toString: Converting the internal representation (having in decimal 213237 digits) to a decimal number string.

Test        BigInteger    BigInt    Apint     LargeInteger
----------------------------------------------------------
Parsing     31.602s       8.497s    0.049s    14.054s
Add         6.394s        3.006s    27.279s   6.322s
Sub         5.618s        2.243s    24.35s    6.026s
MidMul      2.676s        2.259s    40.433s   3.176s
TinyMul     10.683s       1.468s    35.535s   6.879s
BigMul      12.332s       0.655s    0.289s    1.266s
MidDiv      9.936s        7.558s    297.022s  3239.432s
BigDiv      3.647s        2.536s    0.563s    5.335s
toString    16.912s       4.614s    0.029s    15.116s

Comments

Parsing: Apint is by far the fastest parser, this because it internally (probably) uses a power of 10 as base. BigInteger is by far the slowest.
Add: BigInt is fastest due to its mutability. Apint is by far the slowest, since it uses a power of 10 base.
Sub: The same goes here. Apint is slow when it comes to simple arithmetic operations.
MidMul: A bit surprisingly Apint is the slowest here, while BigInt is the fastest.
TinyMul: Not surprisingly BigInt is the fastest (since it's designed around this user case) and Apint the slowest.
BigMul: BigInteger is by far the slowest since it utilizes a naive O(n²) algorithm, whereas BigInt uses (a parallel) Karatsuba, and Apint probably uses FFT making it the fastest.
MidDiv: What the heck happened to LargeInteger!? 50 minutes runtime!? Well that's simply unacceptable. BigInt and BigInetger are the fastest here. Apint (and definitely LargeInteger) probably makes unwise algorithm choices.
BigDiv: LargeInteger is the slowest, but by no means bad. Apint is once again fastest, this is really the area where it shines.
toString: Both BigInteger and LargeInteger are just slow. LargeInteger also uses for some unknown reason a recursive approach, which could cause Stackoverflow if you don't increase your stack size (terrible design choice indeed...). Once again Apint is the fastest due to its internal representation.

This survey is by no means comprehensive, but it gives a hint and its verdict is clear. BigInteger is good if you want a bug free (I guess ^^) and foolproof class. LargeInteger isn't good for anything. Apint is good if your program frequently parses and prints decimal numbers, it's also as of today the fastest implementation when it comes to multiplication and division of big numbers (although that will probably change soon ;D ). Otherwise BigInt really is your choice, sure there's not much else to it than the "4 basic operations" and since it's mutable it's a bit risky (please be careful if start putting it into trees etc.).

The code for BigInt can be found through the link at the top leading to the "Huldra project" (a reference to a forest creature in Scandinavian folklore) and I'll hope you might learn something from it. Most tutorials out there just teach you how to do it in base 10^9 which is a bit noobish.

Klein's blog

Introduction

Example

Comparison

Comments