Help with Suffix Arrays

→ Pay attention

Before contest
Ethflow Round 1 (Codeforces Round, Div. 1 + Div. 2)
4 days

→ Top rated

#	User	Rating
1	jiangly	4039
2	tourist	3841
3	jqdai0815	3682
4	ksun48	3590
5	ecnerwala	3542
6	Benq	3535
7	orzdevinwang	3526
8	gamegame	3477
9	heuristica	3357
10	Radewoosh	3355

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	atcoder_official	160
3	Um_nik	160
5	djm03178	157
6	Dominater069	156
7	adamant	153
8	luogu_official	152
9	awoo	151
10	TheScrasse	147

View all →

→ Find user

→ Recent actions

Detailed →

megalodon's blog

Help with Suffix Arrays

By megalodon, 12 years ago, In English

Hello. Can anybody please provide me a clean tutorial of how to implement a Suffix Array in Java? Thanks

suffix array, strings, sortings

-9

megalodon
12 years ago
13

Comments (13)

Write comment?

phantom11

12 years ago, # |

May be this helps.. It has a clean implementation along with lots of practice problems

→ Reply

megalodon

12 years ago, # ^ |

thanks, but the suffix array presented is too slow. You need something at least O(n logn) for a contest.

→ Reply

dalex

12 years ago, # |

+20

https://sites.google.com/site/indy256/algo/suffix_array — just an implementation.

Also, I don't think that it's time to learn suffix array with your green colour...

→ Reply

Burunduk1

12 years ago, # ^ |

that it's time to learn suffix array with your green colour...

Maybe, it's true. But...

For example, not all my students in university (they had AC on "Suffix Array task" as part of practice) where good in olympiads. Even clever students. Some of them even never participated :-)

Or university is not good place/time to learn Suffix Array? Or not "true olympiad people" should not know about it? :)

→ Reply

dalex

12 years ago, # ^ |

I'm sure that author wants to learn it for olympiads. But if so, there are many easier and more helpful algorithms.

Your university is good place to learn suffix array. I guess most of my groupmates will be in the army now if there's a similar course in my university. And I have a feeling that most of the universities are closer to my than to yours :D

→ Reply

Zlobober

11 years ago, # ^ |

Re-read your comments. Sounds arrogant, isn't it?

→ Reply

AlexanderBolshakov

11 years ago, # ^ |

It doesn't matter how it sounds, because the truth looks exactly as dalex said 1.5 years ago.

Just go to a random university and try to explain suffix array to a random CS student. You'll be heavily surprised that this student very likely won't understand you. After this you will possibly start to appreciate the fact that you study in MSU and are surrounded with lots of smart people.

And yes, I really know what I say.

→ Reply

megalodon

12 years ago, # ^ |

← Rev. 2 →

It's always time to learn. Thanks for the implementation anyway it's good enough.

→ Reply

Dixtosa

11 years ago, # ^ |

Incorrect attitude bro, studying classical algorithms also teaches me quite productively how to think.

→ Reply

halfo

12 years ago, # |

Well,, I think this two (1 and 2) links provide a good number of resources on suffix array. You can also take help from Stack Overflow too.

Hope these help you. And sorry for my poor English. Thanks :)

→ Reply

allthecode

11 years ago, # |

← Rev. 3 →

Naive implementation is many times just fine for a contest.

→ Reply

ltaravilse

11 years ago, # |

I think that the question asked on a tutorial of how to implement it and not of an implementation already coded. So here's my brief explanation on how you can implement it by yourself (you always learn more by implementing an algorithm your self than by just using it).

You'll group all the suffixes in buckets, and you'll do it by comparing their prefixes of sizes 2^i for all i until 2^i >= length of the string. Let's say you have the string "ABRACADABRA", then you'll have all this suffixes: "ABRACADABRA", "BRACADABRA", "RACADABRA", "ACADABRA", "CADABRA", "ADABRA", "DABRA", "ABRA", "BRA", "RA" and "A".

First of all you'll group them in buckets for the first letter, so the buckets will be:

["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]

["BRACADABRA", "BRA"]

["CADABRA"]

["DABRA"]

["RACADABRA", "RA"]

Note that in each bucket the strings aren't sorted but two strings are properly sorted between each other if they belong to different buckets. This comparison is O(n log n) as you only sort comparing by one character per suffix (The first character of the suffix). So now you have the strings sorted by the first character, and you want to sort them by the first two characters. What you'll do now is sorting the suffixes that belong to the same bucket between each other, but not comparing with between suffixes of different buckets, as you already know how they compare each other.

Let's say we want to sort the first bucket ["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]. For sorting it we know that the first letter is the same in all the suffixes, so we compare the second letter, but how we compare it? We don't compare characters but suffixes, for example, if we want to compare "ABRACADABRA" with "ACADABRA", we know how "BRACADABRA" and "CADABRA" compare each other by the first letter of the suffix, so we look for this comparison. Now we have the following buckets for all this strings:

["A"] (note that because the "second letter" of this string is the end of string we consider it to come before any other letter).

["ABRACADABRA", "ABRA"]

["ACADBRA"]

["ADABRA"]

This is how we do the second step. Now look that the second bucket (["BRACADABRA","BRA"]) remained the same, because "BR" == "BR", so let's see how we sort it in the third step. Now we want to compare the suffixes by their prefixes of length 4 (or less if the suffix has less than 4 characters), so "BRA" < "BRAC" but we need to do two comparisons, as we know they share the first two characters, but we have to compare "A" == "A" and End of String < "C", but let's see how we do it in one step. We know how "A" and "ACADABRA" compare to each other according to the prefix of length at most 2, that is "A" < "ACADABRA", so we just look for this comparison once, and we know then that "BRA" < "BRACADABRA".

Now let's use a bigger example to make it more clear. Let's say we have the strings "ABCDEFGHIABCDEFGIJK" and "ABCDEFGIJK" in the same bucket and we've already compared the first 4 letters of the suffixes. Now we want to compare "ABCDEFGI" with "ABCDEFGH" (the first 8 letters of the suffixes). We know that "EFGHIABCDEFGIJK" < "EFGIJK" according to the first 4 letters of the suffixes, so we just make look for this comparison and we compare 4 letters in only one operation.

I hope it's clear enough for you to being able to implement it.

→ Reply

Abinash

11 years ago, # |

Thanks Man !!

→ Reply