Блог пользователя bhikkhu

Автор bhikkhu, 10 лет назад, По-английски

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

  1. scan each bucket one by one

  2. take the first bucket

  3. for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

  1. Now, sort the assigned indices of the bucket using counting sort.

  2. Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

  1. Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

  • Проголосовать: нравится
  • -14
  • Проголосовать: не нравится

»
10 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

OKAY. why downvote ? If it's due to the format then that's not because of me. I am not joking here.

»
10 лет назад, # |
  Проголосовать: нравится +18 Проголосовать: не нравится

If you downvote, please give the reason too.

»
10 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

When you have the current bucket for each suffix, you can compute new ones as follows:

For each i, consider the ordered pair ( bucket[i], bucket[i + (1<<k)] ). (here, bucket[index beyond the end] is a value larger than any valid bucket[i] )

Sort the suffixes with those pairs used as keys. This cannot be done by an ordinary countsort (there are about n^2 possible pairs (x,y)), but it can be done by a two-pass radix sort in O(n), or if you are lazy, by a standard sort in O(n log n). (The second approach then gives you O(n log^2 n) overall time complexity.)

After the sort, relabel the buckets in O(n) and you are ready to start a new iteration.

  • »
    »
    10 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I was sorting each bucket one after another then appending the buckets together. I had not thought of assigning pairwise ranks. very stupid of me. +1 and Thank you very much sir for your time.