adamant's blog

By adamant, history, 3 years ago, In English

Hi everyone!

The task of finding the minimum linear recurrence for the given starting sequence is typically solved with the Berlekamp-Massey algorithm. In this article I would like to highlight another possible approach, with the use of the extended Euclidean algorithm.

Great thanks to nor for the proofreading and all the useful comments to make the article more accessible and rigorous.

Tl'dr.

The procedure below is essentially a formalization of the extended Euclidean algorithm done on $$$F(x)$$$ and $$$x^{m+1}$$$.

If you need to find the minimum linear recurrence for a given sequence $$$F_0, F_1, \dots, F_m$$$, do the following:

Let $$$F(x) = F_m + F_{m-1} x + \dots + F_0 x^m$$$ be the generating function of the reversed $$$F$$$.

Compute the sequence of remainders $$$r_{-2}, r_{-1}, r_0, \dots, r_k$$$ such that $$$r_{-2} = F(x)$$$, $$$r_{-1}=x^{m+1}$$$ and

$$$r_{k} = r_{k-2} \mod r_{k-1}.$$$

Let $$$a_k(x)$$$ be a polynomial such that $$$r_k = r_{k-2} - a_k r_{k-1}$$$.

Compute the auxiliary sequence $$$q_{-2}, q_{-1}, q_0, \dots, q_k$$$ such that $$$q_{-2} = 1$$$, $$$q_{-1} = 0$$$ and

$$$q_{k} = q_{k-2} + a_k q_{k-1}.$$$

Pick $$$k$$$ to be the first index such that $$$\deg r_k < \deg q_k$$$. Let $$$q_k(x) = a_0 x^d - \sum\limits_{k=1}^d a_k x^{d-k}$$$, then it also holds that

$$$ F_n = \sum\limits_{k=1}^d \frac{a_k}{a_0}F_{n-k} $$$

for any $$$n \geq d$$$ and $$$d$$$ is the minimum possible. Thus, $$$q_k(x)$$$ divided by $$$a_0$$$ is the characteristic polynomial of the minimum linear for $$$F$$$.

More generally, one can say for such $$$k$$$ that

$$$ F(x) \equiv \frac{(-1)^{k}r_k(x)}{q_k(x)} \pmod{x^{m+1}}. $$$


Linear recurrence interpolation

In the previous article on linear recurrences we derived that the generating function of the linear recurrence always looks like

$$$ G(x) = \frac{P(x)}{Q(x)}, $$$

where $$$P(x)$$$ and $$$Q(x)$$$ are polynomials and $$$\deg P < \deg Q$$$. In this representation, $$$Q(x) = 1 - \sum\limits_{k=1}^d a_k x^k$$$ corresponds to

$$$ F_n = \sum\limits_{k=1}^d a_k F_{n-k}. $$$

Typical competitive programming task of recovering linear recurrence is formulated as follows:


Library Checker — Find Linear Recurrence. You're given $$$F_0, \dots, F_{m}$$$. Find $$$a_1, \dots, a_d$$$ with minimum $$$d$$$ such that
$$$\begin{gather} F_n = \sum\limits_{k=1}^d a_k F_{n-k}. &(\forall n \geq d) \end{gather}$$$

In formal power series terms it means that we're given $$$F(x) = F_0 + F_1 x + \dots + F_{m}x^{m}$$$ and we need to find $$$\frac{P(x)}{Q(x)}$$$ such that

$$$ F(x) \equiv \frac{P(x)}{Q(x)} \pmod{x^{m+1}}, $$$

$$$\deg P < d$$$, $$$\deg Q \leq d$$$ and $$$d$$$ is the minimum possible. Note that it is not required for $$$a_d$$$ to be non-zero.

In this terms, as we will see later, what the problem asks us to find is essentially one of the Padé approximants of $$$F(x)$$$.

Padé approximants


Given a formal power series $$$f(x) = \sum\limits_{k=0}^\infty f_k x^k$$$, its Padé approximant of order $$$[p/q]$$$ is a pair of polynomials $$$(P, Q)$$$ such that the degree of $$$P$$$ is at most $$$p$$$, the degree of $$$Q$$$ is at most $$$q$$$, the polynomial $$$Q$$$ is non-zero and

$$$\begin{gather} f(x) Q(x) \equiv P(x) \pmod{x^{m+1}} \end{gather}$$$

where $$$m=p+q$$$. Padé approximant is denoted $$$[p/q]_{f}$$$.

Padé approximant is unique in the following sense. Let $$$(P_0, Q_0)$$$ and $$$(P_1, Q_1)$$$ both satisfy the definition of $$$[p/q]_f$$$, then

$$$ \begin{cases} f(x) Q_0(x) \equiv P_0(x) \pmod{x^{m+1}},\newline f(x) Q_1(x) \equiv P_1(x) \pmod{x^{m+1}}. \end{cases} $$$

This, in turn, implies

$$$ P_0(x) Q_1(x) \equiv f(x) Q_0(x) Q_1(x) \equiv P_1(x) Q_0(x) \pmod{x^{m+1}}, $$$

meaning that $$$P_0(x)Q_1(x)-P_1(x)Q_0(x)$$$ is divisible by $$$x^{m+1}$$$. But the degree of the difference is at most $$$x^m$$$, thus

$$$ P_0(x) Q_1(x) = P_1(x) Q_0(x), $$$

even disregarding the $$$\bmod x^{m+1}$$$ part. Note that if $$$(P_0, Q_0)$$$ is an approximant and $$$(P_1, Q_1)$$$ satisfies the identity above, it does not imply that $$$(P_1, Q_1)$$$ is also an approximant, meaning that we generally can't just require $$$\gcd(P, Q)=1$$$ normalization.

However, we may search for the representative with minimum possible degree of $$$Q$$$, as the difference $$$\deg Q - \deg P$$$ is the same for all representations, assuming $$$\deg 0 = -\infty$$$.

Note: In scholarly articles, this is often bypassed by requiring $$$\gcd(P, Q)=1$$$ and using the normalization $$$Q(0)=1$$$, thus requiring $$$\frac{P(x)}{Q(x)}$$$ to be a meaningful rational function that coincides with $$$f(x)$$$ in terms up to $$$x^m$$$.


Online Judge — Rational Approximation. Given $$$f(x)$$$, compute $$$p(x)$$$ and $$$q(x)$$$ of degrees at most $$$m-1$$$ and $$$n-1$$$ such that

$$$ f(x)q(x) - p(x) \equiv 0 \pmod{x^{m+n}}. $$$

Extended Euclidean algorithm

Let's look again on the condition

$$$ F(x) Q(x) \equiv P(x) \pmod{x^{m+1}}. $$$

It translates into the following Bézout's identity:

$$$ F(x) Q(x) = P(x) + x^{m+1} K(x), $$$

where $$$K(x)$$$ is a formal power series. When $$$P(x)$$$ is divisible by $$$\gcd(F(x), x^{m+1})$$$ the solution to this equation can be found with the extended Euclidean algorithm. Turns out, the extended Euclidean algorithm can also be used to enumerate all $$$[p/q]_F$$$ with $$$p+q=m$$$.

Formalizing the algorithm

Let's formalize the extended Euclidean algorithm of $$$A(x)$$$ and $$$B(x)$$$. Starting with $$$r_{-2} = A$$$ and $$$r_{-1} = B$$$, we compute the sequence

$$$ r_{k} = r_{k-2} \mod r_{k-1} = r_{k-2} - a_{k} r_{k-1}. $$$

If you're familiar with continued fractions, you could recognize, that this sequence corresponds to the representation

$$$ \frac{A(x)}{B(x)} = a_0(x) + \frac{1}{a_1(x) + \frac{1}{a_2(x) + \frac{1}{\dots}}} = [a_0(x); a_1(x), a_2(x), \dots]. $$$

Same as with rational numbers, for such sequence it is possible to define the sequence of convergents

$$$ \frac{p_k(x)}{q_k(x)} = [a_0(x); a_1(x), \dots, a_k(x)] = \frac{a_{k}(x) p_{k-1}(x) + p_{k-2}(x)}{a_{k}(x) q_{k-1}(x) + q_{k-2}(x)}, $$$

starting with

$$$ \begin{matrix} \frac{p_{-2}(x)}{q_{-2}(x)} = \frac{0}{1}, & \frac{p_{-1}(x)}{q_{-1}(x)} = \frac{1}{0}, & \frac{p_{0}(x)}{q_{0}(x)} = \frac{a_0(x)}{1},& \dots \end{matrix} $$$

Now, one can prove the following identity:

$$$ (-1)^{k}r_k(x) = q_k(x) A(x) - p_k(x) B(x). $$$
Explanation

In other words, we get three sequences $$$r_k$$$, $$$p_k$$$ and $$$q_k$$$ that define a family of solutions to the Bézout's identity.

Using $$$A(x) = F(x)$$$ and $$$B(x) = x^{m+1}$$$, we conclude that

$$$ F(x) q_k(x) \equiv (-1)^k r_k(x) \pmod{x^{m+1}}. $$$

Enumerating approximants

Let's estimate the degrees of $$$r_k$$$ and $$$q_k$$$ to understand how they relate with Padé approximants.

It follows from the recurrence that $$$\deg q_{k} - \deg q_{k-1} = \deg a_k = \deg r_{k-2} - \deg r_{k-1}$$$, therefore

$$$ \deg q_k = \deg a_0 + \dots + \deg a_k $$$

and

$$$ \deg r_k = (m+1) - \deg a_0 - \dots - \deg a_{k} - \deg a_{k+1} = (m - \deg q_k) - (\deg a_{k+1}-1). $$$

This means that $$$\frac{(-1)^{k}r_k}{q_k}$$$ is the Padé approximants $$$[(m-q)/q]_F$$$ for all $$$q$$$ from $$$\deg q_k$$$ to $$$\deg q_k + \deg a_{k+1}-1 = \deg q_{k+1}-1$$$.

Joining it together for all $$$q_k$$$ we see that all $$$[(m-q)/q]_F$$$ for $$$q$$$ from $$$0$$$ to $$$m$$$ are covered.

Thus, if you find $$$k$$$ such that $$$\deg q_k \leq q$$$ and $$$\deg q_{k+1} > q$$$, assuming $$$p+q=m$$$, it will hold that

$$$ [p/q]_F = \frac{(-1)^k r_k}{q_k}. $$$

Finding the linear recurrence

Let $$$F(x) Q(x) \equiv P(x) \pmod{x^{m+1}}$$$ for some $$$P(x)$$$ and $$$Q(x)$$$ such that $$$\deg P(x) < \deg Q(x)$$$.

If $$$\deg P \leq m - \deg Q$$$ then $$$(P, Q)$$$ satisfies the definition of $$$[(m - \deg Q)/\deg Q]_F$$$. Otherwise, $$$[(m-\deg Q)/\deg Q]_F$$$ would provide another pair of $$$P(x)$$$ and $$$Q(x)$$$ with possibly smaller degree of $$$Q$$$ and certainly smaller degree of $$$P$$$.

To avoid dealing with special cases around $$$Q(0)=0$$$, and also around possible necessity to find needed $$$d$$$, while iterating over approximants, you may deem useful to reverse the sequence, for which you search for the recurrence. In this way, rather than fixing $$$Q(0)=1$$$ for the linear recurrence, you'd rather look for highest term of $$$Q(x)$$$ being equal to $$$1$$$.

The approaches are equivalent in a sense that when $$$Q(x) = 1-b_1 x - \dots - b_q x^q$$$ defines the linear recurrence

$$$ F_n = \sum\limits_{k=1}^d b_k F_{n-k}, $$$

the reversed polynomial $$$x^d Q(x^{-1}) = x^d - b_1 x^{d-1} - \dots - b_d$$$ defines the recurrence

$$$ F_n = \sum\limits_{k=1}^d b_k F_{n+k}. $$$
Formal proof

Thus to find the first kind recurrence for $$$F_0,F_1,\dots,F_m$$$, we could find the second kind recurrence for $$$F_m, F_{m-1}, \dots, F_0$$$ instead.

In this notion, the minimum recurrence is defined by the first $$$k$$$ such that $$$\deg r_k < \deg q_k$$$ and has a characteristic polynomial $$$q_k$$$.

I have implemented the $$$O(n^2)$$$ algorithm as a part of my polynomial algorithms library:

// Returns the characteristic polynomial
// of the minimum linear recurrence for
// the first d+1 elements of the sequence
poly min_rec(int d = deg()) const {
    // R1 = reversed (Q(x) mod x^{d+1}), R2 = x^{d+1}
    auto R1 = mod_xk(d + 1).reverse(d + 1), R2 = xk(d + 1);
    auto Q1 = poly(T(1)), Q2 = poly(T(0));
    while(!R2.is_zero()) {
        auto [a, nR] = R1.divmod(R2); // R1 = a*R2 + nR, deg nR < deg R2
        tie(R1, R2) = make_tuple(R2, nR);
        tie(Q1, Q2) = make_tuple(Q2, Q1 + a * Q2);
        if(R2.deg() < Q2.deg()) {
            return Q2 / Q2.lead(); // guarantee that the highest coefficient is 1
        }
    }
    assert(0);
}

You can see this Library Judge submission for further details.

Also here is a library-free version, if you prefer it.

Half-GCD algorithm

The notion above provides the basis to construct the $$$O(n \log^2 n)$$$ divide and conquer algorithm of computing $$$\gcd(P, Q)$$$ in polynomials and finding the minimum linear recurrence. I have a truly marvelous demonstration of this proposition that this margin is, unfortunately, too narrow to contain. I hope, I will be able to formalize the process for good and write an article about it sometime...

As a teaser, here's an example of the problem, that (probably) requires Half-GCD:


Library Checker — Inv of Polynomials. You're given $$$f(x)$$$ and $$$h(x)$$$. Compute $$$f^{-1}(x)$$$ modulo $$$h(x)$$$.


Let $$$\frac{f(x)}{h(x)} = [a_0; a_1, \dots, a_k]$$$ and $$$\frac{p_{k-1}}{q_{k-1}} = [a_0; a_1, \dots, a_{k-1}]$$$, then $$$f q_{k-1} - h p_{k-1} = (-1)^{k-2} r_k = (-1)^{k-2} \gcd(f, p)$$$.

Therefore, if $$$r_{k} = \gcd(f, h)$$$ is non-constant, the inverse doesn't exist. Otherwise, the inverse is $$$(-1)^{k-2} q_{k-1}(x)$$$.

In the problem, $$$n \leq 5 \cdot 10^4$$$, therefore you need to do something better than $$$O(n^2)$$$ Euclidean algorithm.

  • Vote: I like it
  • +175
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +9 Vote: I do not like it

Neat, now I finally have an excuse not to learn Berlekamp-Massey.

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Thanks for writing such good tutorial! If we make use of the ring of reversed formal Laurent series, would it be simpler to explain this algorithm? It's obvious that $$$P(x)/(x^{\deg Q}Q(x^{-1}))=F_0x^{-1}+F_1x^{-2}+\cdots$$$ if we compute the multiplicative inverse of the denominator in $$$R((x^{-1}))$$$ and multiply by the numerator. Actually, I have read about this algorithm in Shoup's book before, but I don't know how to prove.

Ref: Victor Shoup. A Computational Introduction to Number Theory and Algebra (Version 1). page 429

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I wanted to use $$$R((x^{-1}))$$$ at first, I think you indeed can define a continued fraction representation for arbitrary series this way, but I found it inexplicably hard for myself to navigate that area. There likely is some simple interpretation in their terms, but I was just unable to properly formalize it, unfortunately.

    Also I expect that it would be even harder for others to understand what's going on if we digress from $$$R((x))$$$ that everyone is familiar with and do stuff in $$$R((x^{-1}))$$$...

    Though, when we compute $$$\frac{F(x)}{x^{m+1}}$$$, we in fact work with $$$F^R(x^{-1})$$$, so maybe it's a natural interpretation after all...

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it +7 Vote: I do not like it

      Sorry for the late reply after 2 years. I recently implemented this algorithm from the view of my previous comment.

      I have wrote a pseudocode for the naïve algorithm on UOJ in Chinese which I think could help to understand. And I briefly translated below.

      Let's say the sequence $$$\left(a_j\right)_{j\geq 0}$$$ agrees with the coefficients of the rational function $$$P/Q$$$ where $$$P,Q$$$ are polynomials and $$$P/Q=\sum_{j\geq 1}a_{j-1}x^{-j}\in\mathbb{F}\left(\left(x^{-1}\right)\right)$$$. Our goal is to find such $$$P,Q$$$ with $$$\deg Q$$$ minimized. We could find that

      $$$ \frac{\sum_{j=0}^{k-1}a_{k-1-j}x^j}{x^k}=\sum_{j=1}^{k}a_{j-1}x^{-j} $$$

      So the algorithm is as follows:

      $$$ \begin{array}{ll} &\textbf{Algorithm }\operatorname{RFA}(A,B,k)\text{:} \\ &\textbf{Input}\text{: }A,B\in\mathbb{C}\left\lbrack x\right\rbrack, \deg A\lt \deg B, k\in\mathbb{Z}_{\gt 0}\text{.} \\ &\textbf{Output}\text{: }C,D\in\mathbb{C}\left\lbrack x\right\rbrack\text{ such that } \\ &\qquad \qquad \left\lbrack x^{\left\lbrack -k,-1\right\rbrack}\right\rbrack A/B=\left\lbrack x^{\left\lbrack -k,-1\right\rbrack}\right\rbrack C/D, \\ &\qquad \qquad\deg C\lt \deg D,\deg D\text{ is minimized.} \\ 1&\textbf{if }\deg A-\deg B\lt -k\textbf{ then return }(0,1) \\ 2&Q\gets \left\lfloor B/A\right\rfloor,R\gets B\bmod{A} \\ &\text{Now we have }A/B=\cdots +c_{-k}x^{-k}+\cdots +c_{-\deg Q}x^{-\deg Q}\text{,} \\ &\text{If we set }A/B=1/(B/A)=1/(C+D)\text{ where} \\ &\qquad C,D\in\mathbb{C}\left(\left( x^{-1}\right)\right)\text{ and }\deg C=\deg Q\gt \deg D, \\ &\text{then we have }(A/B)\cdot C+(A/B)\cdot D=1\text{.} \\ &\text{If we have }\left\lbrack x^{\left\lbrack -k,-\deg Q\right\rbrack}\right\rbrack 1/C=\left\lbrack x^{\left\lbrack -k,-\deg Q\right\rbrack}\right\rbrack A/B\text{,} \\ &\text{then we can drop }D\text{. Which is saying that} \\ &\qquad (A/B)+(A/B)\cdot D/C=1/C \\ &\implies \deg A-\deg B+\deg D-\deg C\lt -k \\ &\implies \deg D\lt -k+2\deg Q \\ 3&\textbf{if }\deg R-\deg A\lt -k+2\deg Q \textbf{ then return }(1,Q) \\ &\text{If we set }C\gets Q\text{ and now }D=R/A\text{,} \\ &\deg D=\deg R-\deg A\lt -k+2\deg Q\text{,} \\ &\text{then we can just drop }D\text{.} \\ 4&(E,F)\gets \operatorname{RFA}(R,A,k-2\deg Q) \\ &\text{Otherwise we set }C\gets Q+E/F\text{ with }\deg F\text{ minimized.} \\ &\text{Now we have }\frac{1}{Q+E/F}=\frac{F}{QF+E}\text{.} \\ 5&\textbf{return }(F,QF+E) \end{array} $$$

      I called this function "Rational Function Approximation". I didn't give a formal proof, though it seems quite reasonable it could work. So if one think I am wrong, please feel free to point out.

      The thing I want to show is that, if the Half-GCD procedure really do "half" of the "Full-GCD", we could get the rational function after at most one more adjustment after calling the Half-GCD function (if I am right). This is my submission on Library Checker. To gain more flexibility, I have used the implementation from Bernstein's paper. But I think it could work for other implementations such as Yap's and Hoeven's.

»
2 years ago, # |
  Vote: I like it +11 Vote: I do not like it

To avoid having to fiddle around with Padé approximants, I tried to cook up something with continued fractions. More precisely, given a sequence $$$a_0, a_1, a_2, \dots$$$, let

$$$ A = a_0 + a_1 x + a_2 x^2 + \dots $$$

then we can find a unique continued fraction representation

$$$ A = \frac{1}{p_0(1/x) + \frac{1}{p_1(1/x) + \frac{1}{p_2(1/x) + \dots}}} $$$

with polynomials $$$p_1, p_2, \dots$$$ of degree $$$\geq 1$$$ ($$$p_0$$$ might have degree $$$0$$$ if $$$a_0 \neq 0$$$). If $$$A$$$ has a rational generating function (i.e. $$$(a_i)_i$$$ satisfies a linear recurrence), then this continued fraction terminates and allows us to recover the corresponding rational function.

Given "sufficiently many" terms of $$$a$$$, we can compute all the $$$p_i$$$ in $$$O(n^3)$$$ by repeatedly inverting a formal power series. This can be sped up to $$$O(n^2)$$$ by working with a fraction of two power series instead. What I like about this algorithm is that it's very easy to implement from scratch even if the only thing you remember is how the continued fraction representation should look like. This might make it a decent option for no-codebook onsite competitions.

I'm not quite sure how many terms we actually need. I'm hoping $$$2d+1$$$ terms are enough for a linear recurrence of order $$$d$$$, but I'll have to think about a proof. There are some examples where $$$2d$$$ terms aren't enough, for example on $$$1, 1, 2, 3$$$ we find $$$\frac{1-x+x^2}{1-2x+x^2} = 1 + x + 2x^2 + 3x^3 + 4x^4 + 5 x^5 + \dots$$$ instead of the Fibonacci numbers. (My library checker submission only fails the $$$n=2d$$$ case, although I'm surprised it passes the "degenerate" $$$d > n/2$$$ case.)

Ps: It's worth noting that the "simpler" continued fraction of the form

$$$ A = c_0 + \frac{x}{c_1 + \frac{x}{c_2 + \dots}}, c_i \in \mathbb{F}_p $$$

does not work, as the computation of -1 + x/(1 + x^6/(1 + x/(1 + x/(-1 + x^4/(-1 + x/(-1 + x/(1 + x^2/(1+x/1)))))))) needs $$$\Omega(d^2)$$$ terms! The issue is that with this representation, there can be some cancellation that reduces the degrees of the resulting numerator / denominator.