我使用this算法查找2个字符串之间的公共子字符串。请帮助我这样做但是使用Array
这个字符串的常见子串,我应该在函数中忽略它。
我的Java代码:
public static String longestSubstring(String str1, String str2) {
StringBuilder sb = new StringBuilder();
if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
return "";
}
// java initializes them already with 0
int[][] num = new int[str1.length()][str2.length()];
int maxlen = 0;
int lastSubsBegin = 0;
for (int i = 0; i < str1.length(); i++) {
for (int j = 0; j < str2.length(); j++) {
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
if (num[i][j] > maxlen) {
maxlen = num[i][j];
// generate substring from str1 => i
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
//if the current LCS is the same as the last time this block ran
sb.append(str1.charAt(i));
} else {
//this block resets the string builder if a different LCS is found
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
}
}
}
}
return sb.toString();
}
所以,我的功能应该是:
public static String longestSubstring(String str1, String str2, String[] ignore)
答案 0 :(得分:0)
据我了解,您必须忽略那些包含ignore
中至少一个字符串的子字符串。
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
// we must update `sb` on every step so that we can compare it with `ignore`
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
sb.append(str1.charAt(i));
} else {
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
// check whether current substring contains any string from `ignore`,
// and if it does, find the longest one
int biggestIndex = -1;
for (String s : ignore) {
int startIndex = sb.lastIndexOf(s);
if (startIndex > biggestIndex) {
biggestIndex = startIndex;
}
}
//Then sb.substring(biggestIndex + 1) will not contain strings to be ignored
sb = sb.substring(biggestIndex + 1);
num[i][j] -= (biggestIndex + 1);
if (num[i][j] > maxlen) {
maxlen = num[i][j];
}
}
如果你必须忽略与<{1}}中的任何字符串完全的子串,
然后,当找到最长公共子串的候选者时,迭代ignore
并检查其中是否存在当前子串。
答案 1 :(得分:0)
创建一个字符串的后缀树,然后遍历第二个树,看看哪个子字符串可以在后缀树中找到。