Here's a nice optimization for linear recurrence problems. This mostly works when the you're trying to find the nth term of a sequence of the form $$$f(n) = a_1 \cdot f(n-1) + a_2 \cdot f(n-2) + \dots + a_m \cdot f(n-m)$$$, usually modulo some big number, where n is too large to use the normal dynamic programming solution, usually $$$ n \approx 10^{18}$$$. If $$$m \leq 100$$$, the normal solution involves matrix multiplication in $$$O(m^3 log(n))$$$. But under some conditions we can do better using dynamic programming and divide and conquer.
The idea involves reducing the problem into finding how many arrays with numbers 1 to m and sum n can be constructed. To do this, we will divide the problem of finding $$$f(n)$$$ into finding $$$f(k)$$$, where k is in the neighborhood of $$$\frac n 2$$$.
All problems shown here can be solved using matrix exponentiation, but due to it's cubic nature, if m was larger it would not be possible.
Magic Gems — 1117D
This was the first problem that I solved using this approach, so I will be using to demonstrate it: You have gems that are either worth 1 or M and your objective is to find how many arrays modulo $$$10^9+7$$$ have worth N. As an example, if N = 10 and M = 3, you would try to find in how many ways you could either write 1 or 3 and add up to N. 11111111, 11111113, 131131, 3331, 1333 are 5 of the possible examples.
Let's look at the largest prefix that doesn't exceed $$$\frac n 2$$$. In the example, this subarray can either have 3, 4 or 5 elements. Moreover, if it has 3 or 4 elements, the next one will be a 3, and then will come a suffix with 4 or 3 elements respectively, but if we have 5, the suffix can be any array with sum 5. With this we can derive:
$$$f(10)=f(5)\cdot f(5) + f(4) \cdot f(3) + f(3) \cdot f(4)$$$
More generally, if $$$k = \lfloor {\frac n 2} \rfloor$$$
$$$f(n) = f(k)\cdot f(n-k) + f(k-1) \cdot f(n-k-2) + f(k-2) \cdot f(n-k-1)$$$
And if m wasn't 3, we get
$$$f(n) = f(k) \cdot f(n-k) + \sum _{i=1}^{M-1} f(k+1) \cdot f(n-k-m+1)$$$
It can be proven that the ith iteration of dividing by 2, the range of values for to calculate are never greater than $$$\frac n {2^i} +2$$$ and are all at least $$$\frac n {2^i} - 2\cdot m$$$, meaning that in each one of the $$$log n$$$ iterations, we will need to calculate $$$O(m)$$$ values. Given that the previous terms were already computed, we can compute that next term in $$$O(m)$$$. Meaning the total complexity is $$$O(m^2 \cdot log(n))$$$ by using a hashmap to save the dp, or $$$O(m^2 \cdot (log (n))^2)$$$ by using a treemap. Due to the numbers coming in batches, I will assume that hacking this resulting in multiple hash collitions and TLE is not possible. If you don't trust the analysis, add an extra $$$O(log n)$$$ term to all results from now on.
Matrix solution — AC 452ms
DP solution — AC 78ms
Fibonacci Numbers — CSES 1722
After seeing the last solution, you may have realized that if M=2, the terms were all fibonacci numbers. This is because they follow the same recurrence $$$f(n) = f(n-1) + f(n-2)$$$.
You may be tempted to just return hardcode M=2, and send it as a solution, but this will give WA due to $$$F_0 = 0$$$, while our solution will assume it equals 1. Changing the values on the dp map so that $$$f(0) = 0, f(1) = 1$$$, also won't solve our problem. Our solution assumes that this is a construction and so by definition, we can never construct an array with sum less than 0, and we can only construct an array with sum 0 in one way. This forces us to leave $$$f(0) = 1, f(1) = 1$$$ and then call $$$F_n = f(n-1)$$$
As both solutions are equally as fast, I will only be linking the dp solution here
Throwing Dice — CSES 1096
This problem can be easily translated as adding up to N using only numbers 1, 2, 3, 4, 5 or 6. Let's use the same trick as before; we will grab the longest prefix that doesn't add up to more than half of N.
Let's supposed that N = 100. The halfway mark is 50, so the last prefix could add up to 45, 46, 47, 48, 49 or 50. In the case that it adds up to 45, I know the next dice throw was a 6, and then comes a suffix of 49. If it added up to 46, the next dice throw was either a 5 or a 6, and then would either come a suffix of 48 or 49. The next one over, if it added up to 47, the next throw would either be 4, 5, or 6, followed by a suffix of 47, 48 or 49. So on and so forth. We derive the next equation:
$$$f(n) = f(k-5)\cdot f(n-k-1) + f(k-4) \cdot (f(n-k-2) + f(n-k-1))+\dots$$$
$$$f(n) = \sum {i=1} ^{6} (f(k-6+i) \cdot (\sum{j=1}^i f(n-k-j)))$$$
If the dice had M faces, then the general equation would be:
$$$f(n) = \sum {i=1} ^{M} (f(k-M+i) \cdot (\sum{j=1}^i f(n-k-j)))$$$
Note that we can compute $$$\sum_{j=1}^i f(n-k-j)$$$ on each iteration fast. If we didn't, then computing $$$f(n)$$$ would take $$$O(m^2)$$$, which would make the whole algorithm $O(m^3 \cdot log(n))
Despite there not being a significant time difference, I leave both solutions here with the dice face count(M) being a variable in case someone wants to compare speed.
Generalizations
Let's try to generalize the problem. The linear recurrence is $$$f(n) = a_1 \cdot f(n-1) + a_2 \cdot f(n-2) + \dots + a_m \cdot f(n-m)$$$, can we find some solution in terms of values in the neighborhood of f(n/2)?
The answer is yes, and if you followed the blog, finding it is not that difficult. Say we have $$$a_1$$$ blocks of length 1, $$$a_2$$$ of length 2, and so on, we want to find in how many ways can these blocks form an array of length $$$n$$$. For this, we check when the array fails to pass the half way mark for the last time. This is at least on $$$k-m+1$$$, where again $$$k = \lfloor {\frac n 2} \rfloor$$$, and then, we must use any block of length $$$m$$$, to get a suffix of length $$$n-k-1$$$. The next place where this could happen is one block over, on $$$k-m+2$$$, where you can choose either to place any block of length $$$m$$$ or $$$m-1$$$, leaving a suffix of length $$$n-k-2$$$ or $$$n-k-1$$$ respectively. Thus we get:
$$$f(n) = \sum_{i=1}^m f(k-m+i)\cdot(\sum_{j=1}^i a_{i-j+1} f(n-k-j))$$$
Note that unless we can easily calculate the inner sum(ie calculate all $$$m$$$ terms in $$$O(m)$$$), the algorithm will have a running time of $$$O(m^3\log{n})$$$, with a constant about 5 times worse than matrix exponentiation. In the cases described where most of the $$$a_j$$$ were zeros, or equal among each other, this can be done, but the general case is a bit harder.
As an exercise for the reader, try to find the solution when $$$a_i=i$$$ for all i, or alternatively, in how many ways can we make N as a sum of dice faces, where we have an unlimited amount of dice with 1 face, with 2 faces, with 3 faces, up to M faces.
Our second problem when trying to generalize can be seen in the fibonacci problem. If instead of defining $$$f(0)=f(1)=1$$$, you tried to define $$$f(0)=0$$$ as the problem suggests, the solution will not be correct. As our solution consists building an array of blocks, we have somewhere implied that the empty solution($$$N=0$$$) is unique, and all negative solutions($$$N<0$$$) are non-existent. The solution given was to shift everything by 1 once the dp is over. We can also multiply terms of a sequence by a constant or add two sequences with the same recurrence and the solution will stay the same. As an exercise, try to find the nth Lucas number using this, which has the same linear recurrence than fibonacci, but starts with 2, 1, 3, 4, 7, 11, 18.