I'd like to ask professional opinion on the following problem — Count String from Hackerrank.
I was able to solve it (at least to produce solution, which passes all testcases). But I am not happy with this solution, because there are testcases where it goes exponential and TLEs. Those testcases are not used to judge the problem, but apparently if it was used in Codeforces or Topcoder round it would have been easily hacked.
Short idea of problem (for more detailed description — follow the link):
- you are given simplified regular expression, not longer then 100 characters
- count number of strings with specific length which match that regular expression. Length can go astronomical (10^9 :)
Solution:
- Because of Length going up to 10^9, the first idea I had was to use Matrix Multiplication
- To use matrix multiplication you need to create DFA graph
- It is easy to build NFA graph from regexp. But you can't use NFA graph to count strings because of epsilon-transitions
- It is easy to convert NFA to DFA getting rid of epsilons
Problem done. Almost. Conversion from NFA to DFA in worst case scenario is exponential. Oops.
Question — is there better approach, which can guarantee polynomial run-time?
There is no editorial for this problem on the site. I don't know if it was used in programming contests.
Автокомментарий: текст был обновлен пользователем zholnin (предыдущая версия, новая версия, сравнить).
Sorry for bringing post back up, I will no try doing it again, if nobody helps I'll assume that this is mystery which has no solution :)
No, there is no polynomial algorithm of conversion from NFA to DFA, because in a worst case scenario the smallest DFA have about 2^n states. Unfortunately, I couldn't prove this. Also, there is no justification of this fact in the Wikipedia
Thanks for breaking the silence! Do you see any solutions to this problem which do not involve NFA -> DFA transition?
I'm sorry, I don't see any other solution. Hope someone more experienced will help you!
As I know, there is no polynomial algorithm. AFAIR on my formal languages classes was given the example when DFA for regular expression has exponential size. (Maybe I mix it up with something else but I don't think so)
Btw, there is interesting algorithm to build DFA directly from regexp, but of course it is exponential too.
Ok, final verdict based on limited review at Codeforces:
Listed problem at Hackerrank is flawed, as it does not have general form solution which runs within time limit on any possible input within input constraints.
Putting this problem in real competition would be a bad idea.
It is possible to get this problem solved at Hackerrank, but only because test input was designed to avoid "exponential explosion".