cardcounter's blog

By cardcounter, history, 2 weeks ago, In English

I am thinking about an effient algorithm for wildcard searching with * representing any characters with any length.

aa*c, she*he, *she*he

Example:

  1. caa find aa*c

    return "DOES NOT EXIST"

  2. hesherheshe find she*he

    return 2

    Because sherheshe begins at index 2

  3. sherheshe find she*he

    return 0

    Because the whole string matches

When I am supposed to return, say, the beginning index of the first matching instance.

Say the pattern is of length M and the document is of length N, and the pattern has K '*' signs. I can think of a solution that first uses AC Automation to find all occurences of each chunk in O(N + M), with bitmask.

While converting the bitmask to indexes takes O(N * K)

Then binary search for the last possible beginning positions for each chunk. This could take O (K log N)

So the overall time complexity is still O (N * K), any way to do better?

References

https://codeforces.net/blog/entry/111380

https://codeforces.net/problemset/problem/1023/A

https://codeforces.net/blog/entry/127169

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By cardcounter, history, 6 weeks ago, In English

What is an efficient algorithm for regex search all occurrences (starting indexes) of a pattern in a long string?

Say I only have special character * in this question, which matches zero or any numbers of the preceeding characters: https://leetcode.com/problems/regular-expression-matching/

a* search in aaa returns [0, 1, 2]

a*b search in abb returns [0]

Here a* and a*b are patterns and aaa and abb are text strings to search within and text strings could be very long.

Borrowing https://leetcode.com/problems/regular-expression-matching/ I can only come up with a solution with O(N * N * M)

References:

https://codeforces.net/blog/entry/73568

https://codeforces.net/blog/entry/111380

Full text and comments »

  • Vote: I like it
  • -8
  • Vote: I do not like it