在浏览时我想出了一个lucene中的拼写检查程序。我有兴趣从tangentum添加phonetix附加组件(特别是metaphone)。有没有办法将metaphone集成到我的程序中?如何整合它?
package com.lucene.spellcheck;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.spell.Dictionary;
import org.apache.lucene.search.spell.PlainTextDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class SimpleSuggestionService {
private static final String F_WORD = null;
public static void main(String[] args) throws Exception {
File dir = new File("e:/spellchecker/");
Directory directory = FSDirectory.open(dir);
SpellChecker spellChecker1 = new SpellChecker(directory);
spellChecker1.indexDictionary(
new PlainTextDictionary(new File("c:/fulldictionary00.txt")));
String wordForSuggestions = "noveil";
int suggestionsNumber = 5;
String[] suggestions = spellChecker1.
suggestSimilar(wordForSuggestions, suggestionsNumber);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
}
}
答案 0 :(得分:0)
您可以传入使用所需语音算法的自定义StringDistance实现,或者将其与其他相似度算法(例如标准LevensteinDistance)结合使用。您只需要实现您在StringDistance实现中的getDistance(String,String)方法。也许类似于:
public MetaphoneDistance() {
Metaphone metaphone = new Metaphone();
}
//I'm not really familiar with the library you mentioned, but I assume generateKeys performs a double metaphone?
public float getDistance(String str1, ,String str2) {
String[] keys1 = metaphone.getKeys(str1);
String[] keys2 = metaphone.getKeys(str2);
float result = 0;
if (key1[0] == key2[0] || key1[0] == key2[1]) result += .5
if (key1[1] == key2[0] || key1[1] == key2[1]) result += .5
return result;
}