Блог пользователя Shade100

Автор Shade100, история, 8 лет назад, По-английски

Hi! I'm having trouble understanding this O(NlogN) implementation of suffix array construction :

Code

In particular:

for (i = 0; i < n; i++) { // count the frequency of each integer rank
	c[i + k < n ? RA[i + k] : 0]++;
}

0 is used as the rank for an empty string when it is also the rank of the smallest suffix, as seen below.

tempRA[SA[0]] = r = 0; // re-ranking; start from rank r = 0

// compare adjacent suffixes
for (i = 1; i < n; i++) {
	// if same pair => same rank r; otherwise,increase r
	tempRA[SA[i]] = (RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k]) ? r : ++r;
}

SA[i] + k can go out of bounds.

These seem like bugs, but the algorithm works, but strangely only when a sentinel character such as $ is appended to the end. Why?

Thanks in advance!

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Because practically when you append a special character (which its ASCII value is lower than any letter in the alphabet) at the end of your string, you're implying that this unused suffix will always have the rank 0, therefore your smallest suffix is always in the index 1 rather than 0.