The first stage of the National Olympiad in Informatics in Romania is taking place this Saturday, so this week there were hosted some practice contests on the Romanian OJ kilonova.
I participated in one of them and one of the problems basically sounds like this: "You are given 3 numbers $$$A,B,C(A,B,C \le 3000)$$$, count the number of strings that have $$$A$$$ letters a
in them, $$$B$$$ letters b
and $$$C$$$ letters c
and also don't have abc
as a substring".
As I have some skill issue when it comes to combinatorics and dynamic programming, I did not manage to solve the problem for a full score but, my interest for this problem came back to life while talking to Matteo.Verz, who told me that he managed to solve the problem (we both tested and his solution gets a full score) in linear time. I was quite surprised because the constraints allowed solutions with a quadratic time complexity and all of the solutions that I heard about were also quadratic. After explaining me his thought process and showing his code, I realized that his solution uses the same idea that I actually tried in the contest, but with a very small change.
Now here comes the problem: neither me nor Matteo.Verz nor tibinyte nor anyone that I tried to discuss this problem with understands why this solution works, so we decided to ask the amazing community of codeforces hoping to get an answer. There are 2 almost identical solutions for this: here and here.
Let's consider the first solution. There $$$dp_i$$$ is the number of strings that contain exactly $$$i$$$ abc
substrings. Now here come some questions:
- If this is the case, why don't we need to subtract all values of $$$dp_{i + 1}$$$, $$$dp_{i + 2}$$$ and so on from $$$dp_i$$$ to compute it correctly?
- The formula used for initializing $$$dp_i$$$ is quite strange as it counts some strings multiple times, like when having $$$1$$$
abc
group and $$$1$$$ unused character from eacha
,b
andc
, it will count the stringabcabc
twice, but the solution is still giving the correct result.