#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

In this blog you can found an efficient implementation of Karatsuba multiply of two polinomials, that sometimes can be used instead of FFT

$$$\text{ }$$$

Hello everyone!

One man ask me to write a code for calculation of $$$1.000.000$$$ fibonacci number in C++. This broblem has been solved in time less than 1 second with Karatsuba multiply. After it, I tried to use this code in standart problems like "we have two strings of letters $$$\text{ATGC}$$$ and want to calculate optimal cyclic rotation of one of them so number of equal position will be greatest" with Karatsuba multiply, and these problems have been successfully solved. There are a lot of naive implementations, but MrDindows helped to write a most efficient of all simple implementations.

Idea of Karatsuba multiply

We have two polinomials $$$a(x)$$$ and $$$b(x)$$$ with equal length $$$2n$$$ and want multiply them. Let suppose that $$$a(x) = a_0(x) + x^n \cdot a_1(x)$$$ and $$$b(x) = b_0(x) + x^n \cdot b_1(x)$$$. Now we can calculate result of $$$a(x) \cdot b(x)$$$ in next steps:

Calculate $$$E(x) = (a_0(x) + a_1(x)) \cdot (b_0(x) + b_1(x))$$$
Calculate $$$r_0(x)=a_0(x)\cdot b_0(x)$$$
Calculate $$$r_2(x)=a_1(x) \cdot b_1(x)$$$
Now, result is $$$a(x) \cdot b(x) = r_0(x) + x^n \cdot \left(E(x) - r_0(x) - r_2(x)\right) + x^{2n} \cdot r_2(x)$$$

We can see, that instead of $$$4$$$ multiplications we used only $$$3$$$, so we have time $$$O(n^{\log_2(3)})$$$ or $$$O(n^{1.58})$$$

Efficient implementation

#pragma GCC optimize("Ofast,unroll-loops")
#pragma GCC target("avx,avx2,fma") 
namespace {
    template<int n, typename T>
    void mult(const T *__restrict a, const T *__restrict b, T *__restrict res) {
        if (n <= 64) { // if length is small then naive multiplication if faster
            for (int i = 0; i < n; i++) {
                for (int j = 0; j < n; j++) {
                    res[i + j] += a[i] * b[j];
                }
            }
        } else {
            const int mid = n / 2;
            alignas(64) T btmp[n], E[n] = {};
            auto atmp = btmp + mid;
            for (int i = 0; i < mid; i++) {
                atmp[i] = a[i] + a[i + mid]; // atmp(x) - sum of two halfs a(x)
                btmp[i] = b[i] + b[i + mid]; // btmp(x) - sum of two halfs b(x)
            }
            mult<mid>(atmp, btmp, E); // Calculate E(x) = (alow(x) + ahigh(x)) * (blow(x) + bhigh(x))
            mult<mid>(a + 0, b + 0, res); // Calculate rlow(x) = alow(x) * blow(x)
            mult<mid>(a + mid, b + mid, res + n); // Calculate rhigh(x) = ahigh(x) * bhigh(x)
            for (int i = 0; i < mid; i++) { // Then, calculate rmid(x) = E(x) - rlow(x) - rhigh(x) and write in memory
                const auto tmp = res[i + mid];
                res[i + mid] += E[i] - res[i] - res[i + 2 * mid];
                res[i + 2 * mid] += E[i + mid] - tmp - res[i + 3 * mid];
            }
        }
    }
}

Testing

Example of solutions for real problems: 528D. Fuzzy Search and AtCoder Beginner Contest 149 E. Handshake.

Stress-test 512K for 64-bit and 32-bit coefficients

It is possible that this code can be used instead of FFT, but be careful: for multiplication of two polinomials with length $$$2^{19}$$$ Karatsuba multiply requires $$$3.047.478.784$$$ operations. Of cource, it can be executed in 1 second, but for greater length...

Rev.	By	When	Δ	Comment
en2	dmkozyrev	2020-02-24 12:39:36	1	Tiny change: 'l position will be g' -> 'l positions will be g'
ru3	dmkozyrev	2020-02-24 12:38:29	79
en1	dmkozyrev	2020-02-24 12:36:53	5579	Initial revision for English translation
ru2	dmkozyrev	2020-02-24 12:07:38	9	Мелкая правка: '(a_0(x) + b_0(x)) \cdot (a_1(x) + b_1(' -> '(a_0(x) + a_1(x)) \cdot (b_0(x) + b_1('
ru1	dmkozyrev	2020-02-24 12:05:14	5530	Первая редакция (опубликовано)

Idea of Karatsuba multiply

Efficient implementation

Testing

History