Question

我有一个包含一些短语的文件。通过lucene使用jarowinkler，它应该从我的文件中获取我输入的最相似的短语。

这是我的问题的一个例子。

我们有一个包含以下内容的文件：

//phrases.txt
this is goodd
this is good
this is god

如果我的输入是这是好的，它应该首先从文件中得到'这是好的'，因为这里的相似度得分是最大的（1）。但由于某种原因，它返回：“这是好的”和“这是上帝”只！

这是我的代码：

try {
    SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance());
    Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath());
    IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper());
    spellChecker.indexDictionary(dictionary,iwc,false);

    String wordForSuggestions = "this is good";

    int suggestionsNumber = 5;

    String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f);
    if (suggestions!=null && suggestions.length>0) {
        for (String word : suggestions) {
            System.out.println("Did you mean:" + word);
        }
    }
    else {
        System.out.println("No suggestions found for word:"+wordForSuggestions);
    }
} catch (IOException e) {
    e.printStackTrace();
}

Answer 1

var args = $.args; var currentNid = args.nid;不会提供与输入相同的建议。引用源代码：

//不要为自己建议一个单词，那将是愚蠢的

如果您想知道字典中是否有suggestSimilar，请使用wordForSuggestions方法：

exist

lucene中的JarowinklerDistance返回了奇怪的结果

1 个答案: