I was practising FFT with some problems, but I found out that my implementation of FFT seems to take longer than what it takes to fit into the TL.
For the problem 528D. Fuzzy Search (I understand that the problem also has a bitmask solution, but I'd like to practise FFT), I implemented the FFT algorithm (code here) for the string comparison, yet it takes more than 3 seconds to compute 12 FFT(and inverse) of a vector with a size of (2^18).
I wonder how I can improve my implementation's runtime?