我正在研究在此链接中找到的后缀数组的特定O(NlogN)实现:https://sites.google.com/site/indy256/algo/suffix_array
我能够理解核心概念,但完全理解实施是一个问题。
public static int[] suffixArray(CharSequence S) {
int n = S.length();
Integer[] order = new Integer[n];
for (int i = 0; i < n; i++)
order[i] = n - 1 - i;
// stable sort of characters
Arrays.sort(order, (a, b) -> Character.compare(S.charAt(a), S.charAt(b)));
int[] sa = new int[n];
int[] classes = new int[n];
for (int i = 0; i < n; i++) {
sa[i] = order[i];
classes[i] = S.charAt(i);
}
// sa[i] - suffix on i'th position after sorting by first len characters
// classes[i] - equivalence class of the i'th suffix after sorting by first len characters
for (int len = 1; len < n; len *= 2) {
int[] c = classes.clone();
for (int i = 0; i < n; i++) {
// condition sa[i - 1] + len < n simulates 0-symbol at the end of the string
// a separate class is created for each suffix followed by simulated 0-symbol
classes[sa[i]] = i > 0 && c[sa[i - 1]] == c[sa[i]] && sa[i - 1] + len < n && c[sa[i - 1] + len / 2] == c[sa[i] + len / 2] ? classes[sa[i - 1]] : i;
}
// Suffixes are already sorted by first len characters
// Now sort suffixes by first len * 2 characters
int[] cnt = new int[n];
for (int i = 0; i < n; i++)
cnt[i] = i;
int[] s = sa.clone();
for (int i = 0; i < n; i++) {
// s[i] - order of suffixes sorted by first len characters
// (s[i] - len) - order of suffixes sorted only by second len characters
int s1 = s[i] - len;
// sort only suffixes of length > len, others are already sorted
if (s1 >= 0)
sa[cnt[classes[s1]]++] = s1;
}
}
return sa;
}
我想知道cnt []数组的使用并将它放在有用位置。 任何指针都会有所帮助。
感谢。