Hi everyone!
This is yet another blog that I had drafted for quite some time, but was reluctant to publish. I decided to dig it up and complete to a more or less comprehensive state for the $300 contest.
Essentially, the blog tells how to combine CDQ technique for relaxed polynomial multiplication ("online FFT") with linearization technique from Newton method (similar approach is used in the first example of the ODE blog post by jqdai0815), so that the functions that typically require Newton's method can be computed online as well. I will try to briefly cover the general idea of "online FFT" too and provide some examples, in case you're not well familiar with it. That being said...
Consider the following setting:
There is a differentiable function $$$F(x)$$$ such that $$$F(0)=0$$$ and a polynomial $$$f(x)$$$.
You want to compute first $$$n$$$ coefficients of a formal power series $$$g(x)$$$ such that $$$g(x) = F(f(x))$$$. However, the series $$$f(x)$$$ is not known in advance. Instead, the $$$k$$$-th coefficient of $$$f(x)$$$ is given to us after we compute the $$$k$$$-th coefficient of $$$g(x)$$$.
Looks familiar? No? Ok, let's make a brief detour first.
CDQ convolution
General idea of CDQ technique is described in the following simple scheme: To compute something on the $$$[l,r)$$$ interval,
- Compute it on $$$[l, m)$$$ for $$$m=\frac{l+r}{2}$$$,
- Compute the influence of $$$[l, m)$$$ onto $$$[m,r)$$$,
- Compute everything else in $$$[m, r)$$$ recursively,
- Merge the results.
This approach is very versatile, and In convolution context, it's commonly known as "online FFT". It has the following typical formulation:
Standard formulation
We want to compute a sequence $$$c_0, c_1, \dots, c_n$$$, such that
where $$$a_0, a_1, \dots, a_n$$$ and $$$b_0, b_1, \dots, b_n$$$ are not known in advance, but $$$a_k$$$ and $$$b_k$$$ are revealed to us after we compute $$$c_k$$$.
In a more polynomial-like manner, we may formulate it as
where the $$$k$$$-th coefficient of the polynomials $$$A(x)$$$ and $$$B(x)$$$ is revealed to us as we compute the $$$k$$$-th coefficient of $$$C(x)$$$.
Examples
Polynomial exponent. Assume you want to compute $$$Q(x)=e^{P(x)}$$$ for a given $$$P(x)$$$.
Jetpack. You want to get from $$$(0, 0)$$$ to $$$(n, 0)$$$. On each step, you increase you $$$x$$$-coordinate by $$$1$$$, and your $$$y$$$ coordinate changes to $$$y+1$$$ if you use the jetpack or to $$$\max(0, y-1)$$$ if you don't. At the same time, you can only have $$$y > 0$$$ for at most $$$2k$$$ steps. How many ways are there to get to $$$(n, 0)$$$ under the given constraints?
Gennady Korotkevich Contest 5 — Bin. Find the number of full binary trees with $$$n$$$ leaves such that for every vertex with two children, the number of leaves in its left sub-tree doesn’t exceed the number of leaves in its right sub-tree by more than $$$k$$$.
Solution
Okay, how to solve this? Let's recall the convolution formula
Assume that we want to compute $$$c_k$$$ for $$$k$$$ in $$$[l, r)$$$ and we already know values of $$$c_k$$$, $$$a_i$$$ and $$$b_j$$$ for $$$i, j, k \in [l, m)$$$. Consider the contribution to $$$c_k$$$ for $$$k \in [m, r)$$$ of $$$(i, j)$$$ pairs such that both $$$i$$$ and $$$j$$$ are below $$$m$$$ and at least one of them is above or equal to $$$l$$$. For each such $$$a_i$$$, values of interest for $$$j$$$ range from $$$0$$$ to $$$\min(m, r-l)$$$.
Correspondingly, for each $$$b_j$$$, values of interest for $$$i$$$ range from $$$0$$$ to $$$\min(l, r-l)$$$. In both cases, the contribution may be computed with an appropriate convolution of polynomials of size at most $$$r-l$$$. Note that in the second case we used $$$\min(l, r-l)$$$ instead of $$$\min(m, r-l)$$$ to avoid double-counting pairs in which both $$$i$$$ and $$$j$$$ are $$$l$$$ or above.
We make two recursive calls and use extra $$$O(n \log n)$$$ time to consolidate them, making it for overall $$$O(n \log^2 n)$$$ time.
Generalization
Now back to the beginning. The examples above typically expect that the right-hand side is formulated in a way that is kind of similar to convolutions. But there are a lot of functions, for which it's not really possible to do so. Try the following ones in a similar setting (i.e. the coefficients of $$$f(x)$$$ are given after the coefficients of $$$g(x)$$$ are computed):
Those are the functions that are quite typical in formal power series (for their interpretation, see here). You may probably rephrase them in a convolution-like manner, so that CDQ is applicable, but you would need to do something ad hoc for each individual function. It doesn't seem very convenient, so it only makes sense to try and find some generic enough approach. What is a generic formulation here?
And you may note that what all these functions $$$F(\dots)$$$ have in common is that they're differentiable. Let's use it in a similar way to what we do in Newton's method and rewrite the right hand side in the following way:
This formula generally works for any $$$f_0$$$. In particular, let $$$f_0$$$ be first $$$n$$$ coefficients of $$$f$$$, then $$$(f-f_0)^2$$$ is divisible by $$$x^{2n}$$$. In other words,
In this formula, we still do not know $$$f(x)$$$ completely. But what's important is that it is no longer an argument of some generic function $$$F(\dots)$$$, so the right-hand side is now a linear function of $$$f(x)$$$! This is exactly the formulation for which we learned to apply CDQ convolution above, that is $$$g(x) = A(x) f(x) + B(x)$$$, where
are specific constant polynomials in this context. This sub-problem is solvable in $$$O(n \log^2 n)$$$ with the standard CDQ technique, and since it allows us to double the number of known coefficients at each step, the overall running time is also $$$O(n \log^2 n)$$$, assuming that we're able to compute $$$F(\dots)$$$ and $$$F'(\dots)$$$ in $$$O(n \log^2 n)$$$ as well.
Thanks for the nice blog! For the special case of online multiplication (which is called relaxed multiplication in literature), there is this paper (by one of the authors of the 2021 $$$O(n \log n)$$$ integer multiplication algorithm) that gives asymptotically faster algorithms compared to the one mentioned in this blog. According to the paper, the techniques mentioned also have applications to computation of some formal power series which are incidentally similar to the ones discussed in this blog.
That's an interesting finding. Though the mentioned complexity is
and I wonder if it bears much practical significance ツ
Actually, negiizhao told me that the high-level idea can be explained in several lines.
If we use D&C with $$$2$$$ branches, we have the recursive formula $$$T(n) = 2T(n/2) + O(n\log n)$$$, what happens if we have $$$b$$$ branches? We divide the sequence into $$$b$$$ blocks with length $$$\lceil n/b\rceil$$$, say, $$$a_{[0]}, \dots, a_{[b-1]}$$$, the key observation is that although the contribution between blocks is $$$b^2$$$ convolutions, we can reuse the information of $$$DFT(a_{[i]})$$$, so the time can be reduced to $$$T(n) = bT(n/b) + O(n\log n + nb)$$$. Take $$$b=\log n$$$, we first achieve a time complexity $$$O(n\log^2 n / \log\log n)$$$. This is still implementable and has a good performance.
To achieve a better time complexity, one needs to realize that, the contribution between the blocks $$$DFT(a_{[i]})$$$ is again several relaxed convolution procedures, i.e., for each fixed $$$j$$$, the sequence $$$DFT(a_{[i]})_{j}$$$ is a sequence with length $$$b$$$ that need to be computed through a relaxed convolution. This gives the recursive formula $$$T(n) = bT(n/b) + (2n/b)T(b) + O(n\log n)$$$, take $$$b = \sqrt{n}$$$ we have $$$T(n) = O(n(\log n)^{\log_2 3})$$$. This is not that easy to implement because we need an asynchronous computation order.
To achieve the final time complexity, one needs to realize that the algorithm above is a $$$2$$$-depth structure, and we need to consider an $$$\ell$$$-depth structure. Suppose we have some appropriate block sizes $$$n_1\dots,n_\ell$$$ such that $$$n_1\cdots n_\ell \approx n$$$, we can write each $$$0\leq d < n$$$ into the digits $$$d_1d_2\ldots d_\ell$$$ such that $$$0\leq d_i <n_i$$$, the contribution from index $$$d$$$ to index $$$e$$$ should be computed at their LCA in the trie of these digits. The advantage is that we can let $$$n_i$$$ equally small, and the recursive formula is
The $$$\exp(2\sqrt{\log 2\log \log n})$$$ factor comes from a careful analysis of the choice of $$$\ell$$$ and $$$n_i$$$. (UPD: Roughly speaking, take $$$n_i = n^{1/\ell}$$$ and $$$\ell = \sqrt{\log_2 \log n}$$$ achieves this bound.)
By the way, this idea of blocking also helps us to shave the constant factors on computing elementary operations, e.g., I. S. Sergeev's algorithms on computing inverse with time $$$(1.25+o(1))\mathsf M(n)$$$, here $$$\mathsf M(n)$$$ is the usual time computing the convolution.
UPD at 2023.8.12: The final complexity claimed by that paper is actually with a factor $$$\exp\left(\sqrt{2\log 2\log \log n}\right) \sqrt{\log \log n}$$$, by balancing the chosen $$$\ell$$$ s in the recursion. It's in the section 4 but not mentioned in the abstract.
How is newton?
another problem with online FFT: abc213 pH
You can use this to solve 1782F from the recent round in $$$O(n^2 \log^{2}(n))$$$. You can consider the weighted sum of all bracket sequences with balance better than $$$-b$$$ of length $$$n$$$ to be $$$[x^n]f_b$$$. Now notice these relations
f_b - 1 = pxf_b^2f_{b+1} + (1-p)xf_b^2f_{b-1}
$$$
Since the relations don't form a dag, its not possible to compute the polynomials one by one. However, if we knew the first $$$k$$$ coefficients of all polynomials, we can find the $$$k+1$$$th coefficient of all polynomials. So we basically need to compute polynomials $$$f_b^2f_{b+1}$$$ and $$$f_{b}^2f_{b-1}$$$ online. The above method only solves for multiplication of 2 polynomials, but its easy to generalize for multiple polynomials as a black box even.
If you are calculating $$$f_b$$$ online, you can use it to calculate $$$f_b^2$$$ online as the online product with itself. If you're able to do that, you can calculate $$$f_b^2f_{b+1}$$$ as the online product between $$$f_b^2$$$ and $$$f_{b+1}$$$. We just need to then divide $$$[x^n]f_0$$$ with the number of ways to choose indexes.