perlentaucher's blog

By perlentaucher, history, 5 months ago, In English

Hi there, I was given the following interview question and was wondering, if there's a more simple way to solve it:

Question: Given a text and a regular expression that only contains one wildcard character '*' (which matches any string), return the length of the longest matched substring in the text. Return -1 if no answer can be found. 1 <= |Text|, |Regex| <= 10^6.

Example: text = "programming", regex = "r*in". The matches are "rammin", "rogrammin", thus the answer is 9.

My solution: I make a case distinction based on whether the regex is of the form (BEFORE*AFTER, AFTER, BEFORE, *). I then use the Z-Function on the string to get matches of BEFORE and AFTER and then select the first/last matches to calculate the length.

Time/space complexity is obviously O(n) but I think that my solution is a bit overcomplicated.

  • Vote: I like it
  • +4
  • Vote: I do not like it

»
5 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

I believe you can also use two pointers to determine the first occurrence of the regex string prefix before * in text and the last occurrence of the regex string suffix after * in text.

My code for this (if it's incorrect I'm sorry and I'd be happy to see tests that it would give the wrong answer to):

Spoiler
  • »
    »
    5 months ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    I think your code is failing on testcase text="aabczzzefg", regex="abc*efg". I first also thought about a two-pointer approach but then I was concerned about partial matches.

    • »
      »
      »
      5 months ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Yes, you're right, it doesn't work. I tried to fix it and take into account some other cases (e.g. text = "aaaaaaaa", regex = "r*r"), so here is my current approach (the code only handles the case when we are sure to have *, if we don't have it we can just add a separate check for that case):

      Spoiler
      • »
        »
        »
        »
        5 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        It's still failing last edge test case even after adding the condition you mentioned :)

        I don't think if I will get the next round mail T_T (there were two questions, first one was easy but this one took time and still didn't pass all the test cases)

        • »
          »
          »
          »
          »
          5 months ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          Can I see the test?

          Edit: Nvm, I believe that text = "a" (or text = "bab" and similar) and regex = "a*a" give the wrong answer (it shouldn't be 1, it should be -1 in fact). I've already posted too many bad solutions, so I won't post anymore, but I think this is the last edge case you should consider

          • »
            »
            »
            »
            »
            »
            5 months ago, # ^ |
              Vote: I like it +1 Vote: I do not like it

            the test case was hidden so can't be really sure!

            But don't get demotivated, your approach gave more than enough idea about how to approach this problem so thanks for posting the solution of the problem !!

            Thanks for the contribution

»
5 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

pointer pL searching for BEFORE in the given string using Horspool algorithm.

pointer pR searching for RETFA (revered the pattern AFTER;) ) using the same algorithm but in the reversed order of the given string.

if we match both the patterns, it is easy to find the length of the matched string.

if we cross pL and pR it will be -1

»
5 months ago, # |
  Vote: I like it +1 Vote: I do not like it

Today, I gave my Amazon OA and I got this question and solved it using the same idea, but using the KMP algorithm, All test cases passed, and I think it is a good approach to follow.

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I think you can solve this problem using string hashing

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Can't this be solved using greedy,

The maximum length can occur if we find first occurence of Before and last occurence of After, and then the length can be easily computed.

If the approach is wrong, can you give any test case?

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Correct me if I am wrong but there are 2 case

1 * is present

2 * is absent

2nd case would be simple kmp algorithm.

And in 1st case replace * with whole text answer would be text.length()