Suffix Array Help

Revision en1, by Shade100, 2017-06-20 04:26:41

Hi! I'm having trouble understanding this O(NlogN) implementation of suffix array construction :

Code

In particular:

for (i = 0; i < n; i++) { // count the frequency of each integer rank
	c[i + k < n ? RA[i + k] : 0]++;
}

0 is used as the rank for an empty string when it is also the rank of the smallest suffix, as seen below.

tempRA[SA[0]] = r = 0; // re-ranking; start from rank r = 0

// compare adjacent suffixes
for (i = 1; i < n; i++) {
	// if same pair => same rank r; otherwise,increase r
	tempRA[SA[i]] = (RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k]) ? r : ++r;
}

SA[i] + k can go out of bounds.

These seem like bugs, but the algorithm works, but strangely only when a sentinel character such as $ is appended to the end. Why?

Thanks in advance!

Tags suffix array, strings, help

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en1 English Shade100 2017-06-20 04:26:41 2848 Initial revision (published)