Question

我最近使用Apache Lucene实现了一个SpellChecker。我的代码如下：

public void loadDictionary() {
    try {
        File dir = new File("c:/spellchecker/");
        Directory directory = FSDirectory.open(dir);
        spellChecker = new SpellChecker(directory);
        Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
        spellChecker.indexDictionary(dictionary, config, false);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public String performSpellCheck(String word) {
    try {
         String[] suggestions = spellChecker.suggestSimilar(word, 1);
         if (suggestions.length > 0) {
             return suggestions[0];
         }
         else {
             return word; 
         }
    } catch (Exception e) {
        return "Error";
    }
}

上面的代码使用英文单词字典。我的准确性有问题。我想要它做的是向拼写不正确的单词（即，未使用的字典中出现的单词）建议类似的单词。但是，如果我将单词“post”发送到performSpellCheck方法，它将返回“poet”，即它正在纠正不需要更正的单词（这些单词存在于字典文件中）。

关于如何改进结果的任何建议？

Answer 1

我认为，您应该使用SpellChecker.exists()方法。仅当字典中不存在单词时才使用suggestSimilar方法。

Apache Lucene - 改进拼写检查器的结果

1 个答案: