mouse_wireless's blog

By mouse_wireless, history, 4 years ago, In English

Hello codeforces,

Long time no see.

I'm having a look at problem #211 on leetcode. It's a classic problem: you add strings to a collection, and you have queries which ask if a string is part of the collection, except the query strings can contain wildcards, which can match any character.

The recommended solution is a trie-based one, in which if you encounter a wildcard, you try to continue the query from each of the current node's children (similar to a normal graph traversal).

The problem with this solution, is that it has a worst case complexity of $$$O(Q * N * L)$$$, where Q is the number of search queries, N is the number of added strings, and L is the maximum length of the string. This is achieved when there are N "different enough" strings (randomly generated will do) of max length, and Q queries of L wildcards (to avoid premature termination of the traversal, you can also force the final character to be different). In this case, on each query, you have to traverse the entire trie, which has $$$O(N * L)$$$ nodes.

I've run a lot of solutions I found on leetcode against this test case (code below).

int main() {
  WordDictionary D;
  for (int i = 0; i < 25000; ++i) {
    string word;
    for (int j = 0; j < 499; ++j)
      word.push_back(rand() % 26 + 'a');
    word.push_back('z');
    D.addWord(word);
  }
  int cnt = 0;
  string q(499, '.');
  q.push_back('a');
  for (int i = 0; i < 25000; ++i)
    cnt += D.search(q);
  cout << cnt << endl;
  return 0;
}

And none of them would finish in under 1 minute (there are some solutions which use a different approach which work on this case but "fail" on others).

So I decided to take out "the big guns" and ask codeforces. Is there a better than worst-case $$$O(Q * N * L)$$$ solution?

  • Vote: I like it
  • +31
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it +105 Vote: I do not like it

This says everything about such unreliable resouces as leetcode, hackerrank, hackerearth, etc, no one tries their best to add good test cases there. Just avoid these sites.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +2 Vote: I do not like it

    And it's not even that hard to think of and generate those test cases (just look at how short the code is). I tried hackerrank for a bit but then found a whole lot of bugs and inconsistencies (in one of the programs, the boilerplate code was even broken, using long which happened to be 32-bit when 64-bit was needed). I tried leetcode, but it just wasn't very good. I haven't even tried hackerearth because it has a bad rep in general.

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +26 Vote: I do not like it

      Just want to clarify, this post isn't really about how weak test cases are, but genuinely asking if anyone can come up with an actual solution for the proposed problem. The leetcode bit is just a sidestory.

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      long being 32-bit or 64-bit is OS dependent. So it might the case their judge uses the setup where long is 64-bit

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        No. I had the correct algorithm, got WA on several test cases. I changed it to long long and got AC. All this was in the site, not tested on my computer at all. And there were people in the discussions who had the same problem too.

»
4 years ago, # |
Rev. 4   Vote: I like it +89 Vote: I do not like it

You shouldn't expect anything substantially better than $$$O((N + Q)^2)$$$ under some plausible complexity conjectures.

There's a problem called 2-Orthogonal Vectors (2-OV): given two sets $$$A$$$, $$$B$$$ of $$$n$$$ binary vectors of length $$$d$$$ each, decide if there's a vector $$$\bar{a} \in A$$$ and a vector $$$\bar{b} \in B$$$ which are orthogonal to each other. Here, $$$\bar{a}$$$ is orthogonal to $$$\bar{b}$$$ if their dot product is $$$0$$$, and because all vectors in our setup is binary, it means that on each coordinate, either $$$\bar{a}$$$ or $$$\bar{b}$$$ is $$$0$$$.

For instance, if $$$n = 3$$$, $$$d = 7$$$, and sets $$$A$$$ and $$$B$$$ are as follows:

A:
0101000
1010011
1001011

B:
1011100
0110000
1100001

...then the third vector of $$$A$$$ (1001011) and the second vector of $$$B$$$ (0110000) are orthogonal to each other. 2-OV is trivially solvable in $$$O(n^2 d)$$$ time. But can we do better?

The Orthogonal Vectors Conjecture (OVC) claims that for $$$d \gg \log n$$$, there is no $$$O(n^{2 - \varepsilon})$$$ solution to 2-OV for any $$$\varepsilon > 0$$$. That is, if this conjecture is true, then there's no $$$O(n^{1.999})$$$ solution. Of course it's a conjecture, so it might not be true, but it's somewhat believable: for instance, it follows from the Strong Exponential Time Hypothesis (of course, assuming that this hypothesis holds); and so far, nobody came up with a faster algorithm. You can find a proof that SETH implies OVC e.g. here: https://cseweb.ucsd.edu/~russell/FGCA/lecture20.pdf.


So now, assume that you somehow miraculously found a solution to your problem that runs in $$$O((N + Q)^{1.999} \cdot \mathrm{poly}(L))$$$ time (you can put any exponent instead of $$$1.999$$$, as long as it's smaller than $$$2$$$). Then, we'll show how to solve Orthogonal Vectors faster than quadratically! Take any instance of OV (e.g., the one above). Then, in $$$B$$$, turn all ones into zeroes, and all zeroes into wildcards:

A:
0101000
1010011
1001011

B:
0?000??
?00????
00????0

Why so? Notice that now, a wildcard string from $$$B$$$ matches a string in $$$A$$$ if and only if the original vectors were orthogonal to each other! Thus, you can solve an instance of OV as follows:

  • For each vector of $$$A$$$, turn it into a binary word, and call addWord (so you call addWord $$$n$$$ times)
  • Afterwards, for each vector of $$$B$$$, turn it into a wildcard word as above, and call search (so you call search $$$n$$$ times).
  • If any search call succeeds, you found that there's a pair of orthogonal vectors!

And if you could solve the wildcard matching problem in $$$O((N + Q)^{1.999} \cdot \mathrm{poly}(L))$$$ time, then you'd have an $$$O(n^{1.999})$$$ solution to OV (which is really improbable)!

So, summing up, if you'd have a fast algorithm to this problem, this would falsify the Orthogonal Vectors Conjecture, which in turn would falsify the Strong Exponential Time Hypothesis. (And both of them are rather unlikely to fail.)