Mooncrater's blog

By Mooncrater, history, 5 years ago, In English

EDIT: Got the problem. When inserting a string to the trie, the temp->endi is always changed. But when a duplicate string is inserted into the trie, the temp->endi value is changed for the node. This is bad, because the ancestors of this node would have a temp->index according to the index of the previous string. So, we'll DFS into it, but in the end, find that no endi value satisfies our constraints. That's why there was an infinite loop.

So all I had to do was temp->endi=min(i,temp->endi). Here is the AC'd solution.

Hello,

This is regarding codechef's question: Sheokand And String. I am facing a consistent TLE/WA for a test case, which I think, should not be the case.

Question description:

There are $$$N$$$ strings as input. Then there are $$$Q$$$ queries, which are of the form $$$R$$$,$$$S$$$,where $$$S$$$ is the string to search, and $$$R$$$ is the number of strings to consider. That is, if $$$R=3$$$ then the answer should be amongst strings 1,2 and 3.

Now, we have to find the string amongst $$$s_1,s_2,...,s_R$$$, which matches $$$S$$$ the most. If there are many strings that qualify this criterion, then we have to select the lexicographically smallest string.

Example:

4
abcd
abce
abcdex
abcde
3
3 abcy
3 abcde
4 abcde

OUTPUT:

abcd
abcdex
abcde

Now, for the first query, abc is matched with all the three strings, but y doesn't. So now the lexicographically smallest string amongst the starting $$$R$$$ strings is abcd.

For the second query, abcd is matched with 1st and 3rd strings, but e only matched abcdex.

For the third query, you have abcde too to select, which matches the query exactly.

Solution:

The solution uses Tries. One can read about the full theory of the solution here, which is the question's editorial.

Still, I'll give a small overview of the solution. We can create a Trie of all the $$$N$$$ strings in the input. We can save three (or two if you insist) variables: end, endi and index. Where end is the bool which saves whether any string is ending at the current node, endi is the index of the string, which ended at the current node, and index is the string index, for which this node was created. This is a secret weapon we'll use later.

Now, we get a query, and we traverse the trie. We only go to nodes which match (obviously) the current string character, and has index less than $$$R$$$. When this condition falls short, then all we have to do is find the lexicographically smallest string which has endi<=R. We can easily see that if a node has an index<=R, then it will have a node in it's subtree that will have endi<=R. This is so, because index is set, when the node was created by the indexth string. Either this string ended at this node, or later in the subtree. So, yeah, proof.

Now all we have to do is find the first string in the sub-trie which has endi<=R, which would be the answer.

For that we can create a while loop like:

while (!(temp->end) || temp->endi > l)//This might take some time = 26 x 10 (max depth of trie)
{
	f(i, 0, 26)
	{
		if (temp->a[i] && temp->a[i]->index <= l)
		{
			ans += (i + 'a');
			temp = temp->a[i];
			break;
		}
	}
}

Using this will give you a TLE. You'll have to change it to:

while (!(temp->end) || temp->endi > l)//This might take some time = 26 x 10 (max depth of trie)
{
	bool found = false ;
	f(i, 0, 26)
	{
		if (temp->a[i] && temp->a[i]->index <= l)
		{
			ans += (i + 'a');
			temp = temp->a[i];
			found = true;
			break;
		}
	}
	if(!found) break ;
}

Which changes my answer to a WA. Now, I got really confused by this. A TLE means that the while loop is executed infinitely. Thus, the inner loop's if condition is never satisfied, in some case.

But this doesn't make sense. Let's see why:

The code before this code is:

Node* temp = root;
for(int j=0; j<(int)se.size();j++)
{
	int c = se[j] - 'a';
	if (!(temp->a[c]) || temp->a[c]->index > l) break;
	ans += se[j];
	temp = temp->a[c];
}

This means that matching is done only till the either the next letter doesn't exist, or the it does exist, but it's index value is greater than l. Which means that the current node, at which this loop breaks has it's index<=l. (Here l is $$$R$$$).

So, we come back to the while loop. So temp is a node which has it's index<=l. Good. If temp is an end point for a string, and has endi<=l then the loop ends here, and the answer is output. But if that is not the case, then we go in the while loop. From the earlier proof, we know that there is a node in temp's subtree that has endi<=l. So, there should never be the case when the whole for loop is executed and the if condition is not satisfied, atleast one time.

Therefore, I do not understand when is the case applicable.

Here is my solution.

Many AC'd solutions use this loop exit flag (like found, in this case). I hope that there's something I am missing here.

Any help is appreciated!

Thank you.

  • Vote: I like it
  • -3
  • Vote: I do not like it