Prerequisites: you need to be familiar with both modular arithmetic and Fast Fourier Transform / number theoretic transform. The latter can be a rather advanced topic, but I personally found this article helpful. Note that there are some relatively easy optimizations/improvements that can be done, like precalculating the modular inverse of $$$n$$$, the "bit-reversed-indices" (can be done in $$$O(n)$$$ with DP), as well as the powers of $$$\omega_n$$$. Also useful is modifying it so that the input array length can be any power of two $$$\leq n$$$; some problems require multiplying polynomials of many different lengths, and you'd rather the runtime be $$$O(n \log n)$$$ over the sums of lengths, rather than (number of polynomials * maximum capacity).
Also helpful is knowing how to use NTT to solve problems modulo $$$998244353$$$, like 1096G - Lucky Tickets and 1251F - Red-White Fence. Note that for some problems it's easier to think of FFT not as multiplying polynomials, but of finding multiset $$$C$$$ as the pairwise sums of multisets $$$A$$$ and $$$B$$$, in the form of arrays $$$A[i] =$$$ number of instances of $$$i$$$ in multiset $$$A$$$. This is equivalent to multiplying polynomials of the form $$$\sum _{i=0} ^n A[i]x^i$$$.
Note that $$$\omega_n$$$ can be easily found via the formula $$$g ^ {(m-1) / n} \ \text{ mod } m$$$, provided that:
- $$$m$$$ is prime
- $$$g$$$ is any primitive root modulo $$$m$$$. It is easiest to find this before hand and then hardcode it in the submission. You can either implement the algorithm yourself or use Wolfram Alpha to find it via the query
PrimitiveRoot[m]
. (Spoiler alert, $$$g = 3$$$ works well for $$$998244353$$$) - $$$n$$$ divides $$$m-1$$$ evenly. As $$$n$$$ is typically rounded up to the nearest power of 2 for basic NTT implementations, this is easiest when $$$m$$$ is of the form $$${a \cdot 2^{k} + 1}$$$ where $$$2^k \geq n$$$. This is why $$$998244353$$$ is a commonly-appearing modulus; it's $$$119 \cdot 2^{23} + 1$$$. Note that this modulus also appears in many problems that don't require FFT/NTT; this is a deliberate "crying-wolf" strategy employed by puzzle writers, so that you can't recognize immediately that a problem requires FFT/NTT via the given modulus.
Now onto the main topic, the "ultimate" NTT
Rationale: There are a few problems like 993E - Nikita and Order Statistics that require FFT, however the results aren't output with any modulus, and indeed may exceed the range of a 32-bit integer type. There are several usual solutions for solving this:
Do NTT with two moduli and restoring the result via Chinese Remainder Theorem. This has several prominent disadvantages:
- uuuu
blablabla
- uuuhuh