What is efficient number system conversion for big integers?

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

Number system conversion algorithm is well-known and taught in almost every basic programming course. To get $$$y_b = x_{10}$$$ (number $$$y$$$ written in base $$$b$$$ equal to number $$$x$$$ base $$$10$$$) we have to iteratively chop the lowest digit in base $$$b$$$ from number $$$x$$$ using division and remainder operators:

def naive_conversion(x, base):
    y = ""
    while x > 0:
        y = str(x % base) + y
        x //= base

    return y

This works in $$$\mathcal{O}(\log x)$$$ for all $$$x$$$ that fit into machine word $$$(\log x < w)$$$. However this becomes inacceptably slow when we consider numbers of length $$$n$$$ that is large enough and we cannot use $$$\mathcal{O}(1)$$$ division. Let's define $$$n$$$ as length of converted number $$$x$$$ in some number system $$$(w \ll \log x = n)$$$. Still, base of number systems $$$b$$$ and all digits in it can be stored in usual integer. So we can implement conversion above in $$$\mathcal{O}(n^2)$$$ with some efficient implementation of big-number-by-small-number division.

This operation is commonly required when you are trying to print big integers in base $$$10$$$ that are stored by big integer library in binary notation. To overcome this some competitive programming libraries use base $$$10^9$$$ to print numbers in base $$$10$$$ digit-by-digit without complicated conversion. But this is just a trick for common number presentation. But how to generally approach this problem and be able to quickly convert big integers to any number system?

We are given big integer $$$x_b$$$ and another number system base $$$c$$$. We have to find $$$y$$$ such that $$$y_c = x_b$$$. $$$f(x_b, c) = y_c$$$. Suppose $$$x_b$$$ has length $$$n$$$ and $$$y_c$$$ will have length $$$m$$$.

Best aproach that I can think of is divide and conquer. Main problem with abritrary bases is that many digits in $$$x$$$ can affect many digits in $$$y$$$. To overcome this we can choose $$$p \approx \frac{m}{2}$$$ and represent $$$x_b = L_b \cdot c^p + R_b$$$, where $$$R_b < c^p$$$. In such form we can see that $$$L$$$ part can only affect higher half of digits of $$$y$$$ and $$$R$$$ part can only affect lower half of digits. Now we can safely split our number into two, solve problem recursively obtaining $$$f(L_b, c) = L'_c$$$ and $$$f(R_b, c) = R'_c$$$ and finally concatenate parts $$$y_c = L'_c \cdot c^p + R'_c$$$.

To split $$$x_b$$$ into $$$L_b, R_b$$$ we can use division and remainder. Assuming that big integer divmod can be implemented in $$$\mathcal{O}(n \log n)$$$ whole recursive calculation of $$$f(x_b, c)$$$ can be done in $$$\mathcal{O}(n \log^2 n)$$$.

def fastbase(x, b, order=-1):
    if order == -1:
        order = 1
        while b**(order+1) <= x:
            order *= 2

    if order == 0:
        return str(x)

    sep = b ** order

    hi, lo = divmod(x, sep)

    return fastbase(hi, b, order//2) + fastbase(lo, b, order//2)

In this implementation $$$m$$$ is naively estimated as lowest possible power of two, but then extra leading zeroes must be stripped.

We made some benchmarking in Kotlin and in Python. This algorithm appears to be somewhat competitive with built-in algorithms in Kotlin, however in Python3 this works even faster.

Benchmark code in Python3

This hand-made benchmarking technique is far from ideal (warming up caches and collecting garbage can affect this unpredictably), but changing the order of calculations shows that execution time does not change much from this.

import time, sys

times = [time.time()]


def benchmark_check(*msg):
    times.append(time.time())
    print(*msg, end=" ")
    print("in", times[-1] - times[-2])


def fastbase(x, b, order=-1):
    if order == -1:
        order = 1
        while b**(order+1) <= x:
            order *= 2

    if order == 0:
        return str(x)

    sep = b ** order

    hi, lo = divmod(x, sep)

    return fastbase(hi, b, order//2) + fastbase(lo, b, order//2)


print("Number system conversion test. Python version:", sys.version)

s = 250*1000

n = 1
for i in range(1, s+1):
    n *= i

benchmark_check("Built", s, "factorial")

n10f = fastbase(n, 10).lstrip('0')
benchmark_check("D&C fast conversion to base 10")

n10 = str(n)
benchmark_check("Default conversion to base 10")

n10format = "{0:d}".format(n)
benchmark_check("Format conversion to base 10")

bin2 = bin(n)
benchmark_check("Bin conversion to base 2")

n2 = "{0:b}".format(n)
benchmark_check("Format conversion to base 2")

n8 = "{0:o}".format(n)
benchmark_check("Format conversion to base 8")

n16 = "{0:x}".format(n)
benchmark_check("Format conversion to base 16")

Local testing results

Number system conversion test. Python version: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
Built 250000 factorial in 15.586419582366943
D&C fast conversion to base 10 in 15.642351150512695
Default conversion to base 10 in 21.88812279701233
Format conversion to base 10 in 21.88341784477234
Bin conversion to base 2 in 0.006005048751831055
Format conversion to base 2 in 0.006005525588989258
Format conversion to base 8 in 0.0030024051666259766
Format conversion to base 16 in 0.0020012855529785156

Blazing fast conversions to binary formats (and powers of binary) show that numbers are stored in binary.

Are there any well-knows algorithms? What algorithms are used in standard libraries? Are there any better solutions?

Rev.	By	When	Δ	Comment
en5	ATSTNG	2021-09-18 01:20:22	0	(published)
en4	ATSTNG	2021-09-18 01:19:22	7	Tiny change: 'ber system.\n\nWe need are given' -> 'ber system?\n\nWe are given' (saved to drafts)
en3	ATSTNG	2021-09-18 01:16:52	0	(published)
en2	ATSTNG	2021-09-18 01:16:00	3799	Tiny change: 'ulate $c^p}$ and calc' -> 'ulate $c^p$ and calc'
en1	ATSTNG	2021-09-18 00:09:43	1759	Initial revision (saved to drafts)

History