Welcome to Part 2
Firstly, I am assuming that you have been through the Part 1 of this blog.
Okay so in hindsight I now see the drawbacks there were in my explanation of the roots of unity and how the divide and conquer works in FFT.
So in this blog I would be aiming to give you a visual intuition of what really FFT exploits over the trivial classical DFT. Also I would be covering up NTT and sharing a nice trick that I got to know about while learning NTT.
Visual Intuition of FFT
The part of converting the polynomial from coefficient form to point value form in instead of the trivial N2 would be elaborated upon in this section as I find the dry mathematical explanation to be rigorous but not intuitive. In the notation below I have referred to n complex nth roots of unity as powers of wn1 = e2π·i / n
Assume that you have a 8 terms (7 degree) polynomial you need to convert from coefficient form to point value form. So firstly you choose the powers of the 8th complex root of unity as the xi's that you will be plugging in the polynomial. Initially this might seem random but hopefully by the end of this topic it would be clear to you.
Okay so now we can reduce our polynomial A(x) as Aeven(x2) + x·Aodd(x2) where Aeven(y) is a 4 terms (3 degree) polynomial in y and Aodd(y) is also a 4 terms (3 degree) polynomial in y.
Now solving the original problem of converting A(x) from coefficient form to point value form for each of the power of the 8th root of unity is exactly equivalent to solving this new problem of converting Aeven(x2) and Aodd(x2) for powers the 8th root of unity.
So really this doesn't help us much. But the trick is that when we square (because x2) these powers of the 8th root of unity, they come out to be same pairwise, i.e. the 0th and 4th powers of the 8th root of unity when squared give the same result, similarly, the 1st and 5th powers of 8th root of unity when squared give the same result and so on.
Also note that these results actually turn out to be the powers of the 4th root of unity.
Okay, so I gave the dry mathematical proof for this in the previous part, but that time I did not have a visual understanding of it. Well now I do so refer to the animations and diagrams below to get the idea but first read through the below subtopic of Rotation in Imaginary Plane.
Rotation in Imaginary Plane
Okay so as a rule of thumb remember that if you have a complex number lets say Z and you multiply it with eiθ then it basically rotates it by θ angle in anti clockwise direction with respect to the origin.
So lets say you have 3rd power of 8th root of unity, i.e. wn3 = ei·2π·(3 / 8) and you wish to exponentiate it. Then currently the θ is .
So if you square it you are actually multiplying the θ by 2 so you are basically rotating it by θ degrees in anti clockwise direction with respect to the origin, i.e. it will now be make an angle of with the positive x axis in the anti clockwise direction.
Raising it by power x is (wn3)x = ei·2π·(3 / 8)·x = ei·2π·(3 / 8)·ei·2π·(3 / 8)·(x - 1) which is equivalent to rotating ei·2π·(3 / 8) by in anti clockwise direction with respect to positive x axis.
So given this understanding we can now easily exponentiate roots of unity. So here is the key, when you mark the 8th roots of unity on a graph it is regular octagon whose vertices are making angle degree with respect to positive x axis.
Now when you square each of these, the angles double as shown above, so they become 0 * 2, 45 * 2, 90 * 2, 135 * 2, 180 * 2, 225 * 2, 270 * 2, 315 * 2 = 0, 90, 180, 270, 360, 450, 540, 630 = which are basically the points of the 4th root of unity. Below is a beautiful animation showing the same (You would need to open the link below and play the animation from the top left hand side)
Now you can clearly grasp the complexity of T(n) = T(n / 2) (For Aeven) + T(n / 2) (For Aodd) + O(2·n)(For combining the results as Aeven(x2) + x·Aodd(x2)). So
NTT (Number Theoretic Transform)
The objective of NTT is to multiply 2 polynomials such that the coefficient of the resultant polynomials are calculated under a particular modulo. The benefit of NTT is that there are no precision errors as all the calculations are done in integers. A major drawback of NTT is that generally we are only able to do NTT with a prime modulo of the form 2k·c + 1, where k and c are arbitrary constants. So for doing it for a random mod we would need to use CRT (Chinese Remainder Theorem).
Firstly, nth roots of unity under a primitive field, i.e. are defined as , where P is only considered as prime for simplicity.
Here we are assuming that P = 2k·c + 1, where c, k are positive integers and P is prime.
Note All the calculation done below are done under modulo P including the operations in the exponents.
Example rP + 5 = r5. The modulo has been intentionally skipped sometimes as it makes the notation way too ugly.
So we first find a r such that goes through all the numbers from 1 to P - 1 when x goes from 1 to P - 1. Note that is already a fact using Fermat's Little Theorem which states that if gcd(a, P) = 1
After finding one such r, the (2k)th root of unity under the modulo field of P will be rc.
The powers will be as
Notice that as wnn = 1 for complex roots similarly here
Lemma 1 Here when 1 ≤ x < 2k. this is because r is itself defined as the (P - 1)th root of unity, i.e. any positive power of r less than P - 1 will not be equal to , and as P - 1 = 2k·c, therefore any power of r where c is multiplied by anything less than 2k would not give .
Lemma 2 Another key observation is that (rc)α = (rc)β where α ≠ β and both lie between [0, 2k) can never be true, this observation can be proven by contradiction, assuming that rc·α = rc·β under then which can be generalised to (rc)α + γ = (rc)β + γ where γ is an integer. But we already know for a fact that , so lets put γ = 2k - α, so now where β + γ ≠ 2k because α ≠ β. This is contradictory to the result mentioned in Lemma 1, hence proved.
So, rc is now the (2k)th root of unity under the modulo field of P. Now the only difference between NTT and FFT will be that the nth root of unity changes from w to rc, where r is found by hit and trial method.
Trick to handle more prime modulos
Instead of breaking the sub-problems as powers of 2 (into two halves), we can generalise it to powers of any number.
Then we can basically solve NTT for a prime of the form Ak·c + 1, obviously if A = 2, then it gives the best time complexity and it would be slower when we opt for different and bigger A's. But it would still work which is the key.
Okay so how ?
Let's take an example for 3
So we have a polynomial of lets say 33 = 27 terms, i.e. it is a 26 degree polynomial.
Now A(x) = a0 + a1·x1 + a2·x2 + ... + a26·x26
A(x) = (a0 + a3·x3 + ... + a24·x24) + (a1·x + a4·x4 + ... + a25·x25) + (a2·x2 + a5·x5 + ... + a26·x26) = A0mod3(x3) + x·A1mod3(x3) + x2·A2mod3(x3)
Now the neat trick is that the complex roots not only coincide in one another when squared but also when exponentiated to any other power, in this case when cubed.
Below is an animation showing a regular 9-gon denoting the 9th roots of unity and when cubed, forming an equilateral triangle denoting the 3rd roots of unity.
So basically
Here it is O(2·3·N) as on each step of conquer for the N terms we need to do 3 multiplications i.e. A0mod3(x3)·1, A1mod3(x3)·x, A2mod3(x3)·x2 and 3 additions, i.e. adding all the terms together.
For a general A the time complexity will be
Note The underlying idea behind using roots of unity in any kind of transform is that the roots of unity under any field (complex numbers or under modulo P) form in themselves a cyclic group over multiplication, i.e if any 2 elements of these roots of unity are operated using multiplication then it is guaranteed that their product will be element of this set as well. Also this set has an identity element. So the property of these roots of unity of coinciding into one another when exponentiated is satisfied because they are forming a cyclic group over multiplication and also because the size of this group is of the form Ak
Problems for Practice
To be added.
Thanks to Baba and polingy for assisting me in understanding the concepts mentioned in this blog and to NibNalin for proof reading the blog.
References and resources used Introduction to Algorithms (aka CLRS), Better Explained, Apfloat blog on NTT, Desmos Graphing Calculator
Any feedback would be appreciated. Please notify me about any typos or errors in the comments or in PM.