Suffix Array / Manber and Myers Algo

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

scan each bucket one by one
take the first bucket
for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

Now, sort the assigned indices of the bucket using counting sort.
Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

Comments (5)

Write comment?

bhikkhu

10 years ago, # |

OKAY. why downvote ? If it's due to the format then that's not because of me. I am not joking here.

→ Reply

10 years ago, # ^ |

Why does the post appear so dirty?

+18

If you downvote, please give the reason too.

misof

When you have the current bucket for each suffix, you can compute new ones as follows:

For each i, consider the ordered pair ( bucket[i], bucket[i + (1<<k)] ). (here, bucket[index beyond the end] is a value larger than any valid bucket[i] )

Sort the suffixes with those pairs used as keys. This cannot be done by an ordinary countsort (there are about n^2 possible pairs (x,y)), but it can be done by a two-pass radix sort in O(n), or if you are lazy, by a standard sort in O(n log n). (The second approach then gives you O(n log^2 n) overall time complexity.)

After the sort, relabel the buckets in O(n) and you are ready to start a new iteration.

I was sorting each bucket one after another then appending the buckets together. I had not thought of assigning pairwise ranks. very stupid of me. +1 and Thank you very much sir for your time.

bhikkhu's blog