Hi guys. I've been trying to learn fast polynomial multiplication. I found that the most famous(probably?) algorithms are FFT and Karatsuba algorithm.
Karatsuba algorithm is pretty easy, but it is slow. I couldn't submit few problems because of TLE. Even if a submission passed, it was on the edge between AC and TLE.
On the other side, FFT is much faster but also much more complex. By complex I mean that it's hard to truly understand and implement it.
I need an advice: what should I do? Should I stick with Karatsuba algorithm and learn how to implement it better? Or should I make one more try to understand FFT? If yes, where is the best place to learn it without much pain?
FFT will be faster than Karatsuba for large instance, regardless on how efficient you implement Karatsuba.
Polynomial multiplication problems are kinda rare in competitive programming, but most of them will require you to use FFT. So if you want to solve those you need to learn FFT. E.g. on https://cp-algorithms.com/algebra/fft.html (shameless self-promotion).
Thank you!
Apart from plain polynomial multiplication, FFT and Karatsuba just have slightly different purposes.
Karatsuba pros:
Works with any ring.
If you need to multiply two polynomials only once, almost always still barely fits in TL when implemented properly.
Karatsuba cons:
FFT pros:
Almost always faster.
Has a broad field of various applications and tricks besides just polynomial multiplication.
FFT cons:
So, in general, there are problems that are solvable by one algorithm, but not another and vice versa, FFT is a bit more common, but knowing Karatsuba may help you in cases when you need to multiply modulo $$$2^{64}$$$, for example. It is doable with FFT, but much harder. There was a problem about that on some TooSimple's contest in PTZ, presumably, in 2018.
Thanks for the detailed answer. Also, the code is very nice.
Historically, we (at least for Chinese) cannot effectively compute the multiplication of two polynomials modulo $$$10^9+7$$$ date back to 2014. In these old days, FFT (with complex) isn't doable because it loss precision for number of magnitude $$$n \times 10^{18}$$$. We should have tried the so-called "three-mods NTT" which is to pick three NTT primes $$$p_1$$$, $$$p_2$$$, $$$p_3$$$ fitting in int32_t and reconstruct the result using Chinese Remainder Theorem.
The "three-mods" approach requires $$$9$$$ calls to NTT, which is slow and hardly useful. In such case, we switched to Karatsuba multiplication,
However, after a clever guy figuring out (who exactly? I may have learned the trick from some CodeChef submission) that we can split a $$$10^9$$$ into something like $$$10^5 \times 65536 + 10^5$$$, and utilize the real parts of complex to save constants, the Karatsuba thing become less useful.
Last note: Also there is cultural shift in competitive programming community. Dating back our favorite prime is $$$10^9+7$$$ (and something $$$10^9+9$$$), you can check old TC tasks for the fact. Today, seemingly everything should be computed under modulo $$$998'244'353$$$ (Thanks to uoj.ac).