后缀数组O(NlogN)实现

时间:2015-08-25 16:59:51

标签: java algorithm data-structures suffix-array

我正在研究在此链接中找到的后缀数组的特定O(NlogN)实现:https://sites.google.com/site/indy256/algo/suffix_array
我能够理解核心概念,但完全理解实施是一个问题。

public static int[] suffixArray(CharSequence S) {
 int n = S.length();
 Integer[] order = new Integer[n];
 for (int i = 0; i < n; i++)
  order[i] = n - 1 - i;

// stable sort of characters
Arrays.sort(order, (a, b) -> Character.compare(S.charAt(a), S.charAt(b)));

int[] sa = new int[n];
int[] classes = new int[n];
for (int i = 0; i < n; i++) {
  sa[i] = order[i];
  classes[i] = S.charAt(i);
}
// sa[i] - suffix on i'th position after sorting by first len characters
// classes[i] - equivalence class of the i'th suffix after sorting by first len characters

for (int len = 1; len < n; len *= 2) {
  int[] c = classes.clone();
  for (int i = 0; i < n; i++) {
    // condition sa[i - 1] + len < n simulates 0-symbol at the end of the string
    // a separate class is created for each suffix followed by simulated 0-symbol
    classes[sa[i]] = i > 0 && c[sa[i - 1]] == c[sa[i]] && sa[i - 1] + len < n && c[sa[i - 1] + len / 2] == c[sa[i] + len / 2] ? classes[sa[i - 1]] : i;
  }
  // Suffixes are already sorted by first len characters
  // Now sort suffixes by first len * 2 characters
  int[] cnt = new int[n];
  for (int i = 0; i < n; i++)
    cnt[i] = i;
  int[] s = sa.clone();
  for (int i = 0; i < n; i++) {
    // s[i] - order of suffixes sorted by first len characters
    // (s[i] - len) - order of suffixes sorted only by second len characters
    int s1 = s[i] - len;
    // sort only suffixes of length > len, others are already sorted
    if (s1 >= 0)
      sa[cnt[classes[s1]]++] = s1;
  }
}
return sa;
}

我想知道cnt []数组的使用并将它放在有用位置。 任何指针都会有所帮助。

感谢。

0 个答案:

没有答案