我正在寻找一种算法,可以有效地搜索查询字符串中给定编辑距离内的单词,而忽略空格。
例如如果我需要建立索引的单词是:
OHIO, WELL
并查询字符串:
HELLO HI THERE H E L L O WORLD WE LC OME
对于编辑距离1,我需要输出:
HELL, O HI T, H E L L, WE LC
为了忽略空白部分,也许我们可以删除所有空格,但是我找不到任何算法可以在没有空格的字符串中模糊地搜索文本。
我做了很多研究,没有成功。如果问题不清楚或需要更多信息,请告诉我。
答案 0 :(得分:0)
public static void main(String[] args) {
System.out.println(getMatches(List.of("OHIO", "WELL"), "HELLO HI THERE H E L L O WORLD WE LC OME", 1));
}
private static List<String> getMatches(List<String> words, String query, int editDistance) {
return words.stream()
.flatMap(w -> getMatches(w, query, editDistance).stream().map(String::trim))
.distinct()
.collect(Collectors.toList());
}
private static List<String> getMatches(String word, String query, int editDistance) {
List<String> matches = new ArrayList<>();
for (int i = 0; i < query.length(); i++) {
StringBuilder candidate = new StringBuilder();
StringBuilder candidateWithoutSpaces = new StringBuilder();
populateCandidates(word, query, i, candidate, candidateWithoutSpaces);
if (isMatch(candidateWithoutSpaces, word, editDistance)) matches.add(candidate.toString());
}
return matches;
}
private static boolean isMatch(StringBuilder candidateWithoutSpaces, String word, int editDistance) {
if (candidateWithoutSpaces.length() != word.length()) return false;
for (int i = 0; i < candidateWithoutSpaces.length(); i++) {
if (candidateWithoutSpaces.charAt(i) != word.charAt(i) && --editDistance < 0) return false;
}
return true;
}
private static void populateCandidates(String word, String query, int i, StringBuilder candidate, StringBuilder candidateWithoutSpaces) {
int j = 0;
while (candidateWithoutSpaces.length() < word.length() && i + j < query.length()) {
char c = query.charAt(i + j);
candidate.append(c);
if (c != ' ') candidateWithoutSpaces.append(c);
j++;
}
}
输出
[O HI T, HELL, H E L L, WE LC]