The first stage of the National Olympiad in Informatics in Romania is taking place this Saturday, so this week there were hosted some practice contests on the Romanian OJ kilonova.
I participated in one of them and one of the problems basically sounds like this: "You are given 3 numbers $$$A,B,C(A,B,C \le 3000)$$$, count the number of strings that have $$$A$$$ letters a
in them, $$$B$$$ letters b
and $$$C$$$ letters c
and also don't have abc
as a substring".
As I have some skill issue when it comes to combinatorics and dynamic programming, I did not manage to solve the problem for a full score but, my interest for this problem came back to life while talking to Matteo.Verz, who told me that he managed to solve the problem (we both tested and his solution gets a full score) in linear time. I was quite surprised because the constraints allowed solutions with a quadratic time complexity and all of the solutions that I heard about were also quadratic. After explaining me his thought process and showing his code, I realized that his solution uses the same idea that I actually tried in the contest, but with a very small change.
Now here comes the problem: neither me nor Matteo.Verz nor tibinyte2006 nor anyone that I tried to discuss this problem with understands why this solution works, so we decided to ask the amazing community of codeforces hoping to get an answer. There are 2 almost identical solutions for this: here and here.
Let's consider the first solution. There $$$dp_i$$$ is the number of strings that contain exactly $$$i$$$ abc
substrings. Now here come some questions:
- If this is the case, why don't we need to subtract all values of $$$dp_{i + 1}$$$, $$$dp_{i + 2}$$$ and so on from $$$dp_i$$$ to compute it correctly?
- The formula used for initializing $$$dp_i$$$ is quite strange as it counts some strings multiple times, like when having $$$1$$$
abc
group and $$$1$$$ unused character from eacha
,b
andc
, it will count the stringabcabc
twice, but the solution is still giving the correct result.
Let's write
a function which counts the number of strings with $$$A - x$$$ a's, $$$B - x$$$ b's, $$$C - x$$$ c's and $$$x$$$ special characters "?".
Now, if we replace each apparition of "?" with "abc", this counts the strings with at least $$$x$$$ substrings equal to "abc", with one caveat: it counts strings with $$$y \geq x$$$ multiple times. How many times? Well, if a string has $$$y$$$ "abc"s and we choose to replace $$$x$$$ of them with "?", then it counts that string $$$y \choose x$$$ times.
Thus, if the actual count of strings with exactly $$$x$$$ substrings equal to "abc" is $$$s(x)$$$, and for any $$$x > min(A,B,C)$$$ by definition $$$s(x) = 0$$$, we have
Now, the answer is
The factor for $$$s(x)$$$ in $$$S$$$, $$$0 \leq x \leq min(A,B,C)$$$ is
For any $$$x \geq 1$$$, it is known that $$$C_x = 0$$$. Finally,