我想使用Lucene建议机制来帮助最终用户找出他输错的时间。
Lucene的SpellChecker
有一个方法suggestSimilar
,它应该接收一个SuggestionMode标志。使用标志SuggestMode.SUGGEST_MORE_POPULAR
,我希望只提供当前目录中更多的单词建议。
以下代码似乎不同意这一假设:
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.spell.LuceneDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.search.spell.SuggestMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
public class SuggestTest {
static public void main(String args[]) throws IOException {
final String NAME_FIELD = "NAME";
Directory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory,
new IndexWriterConfig(new SimpleAnalyzer()));
writer.deleteAll();
writer.commit();
List<String> list = new LinkedList<>();
for (int i = 0; i < 1000; i++)
list.add("wafa");
list.add("waffa");
for (String name : list) {
Document doc = new Document();
doc.add(new TextField(NAME_FIELD, name, Field.Store.YES));
writer.addDocument(doc);
}
writer.close();
DirectoryReader directoryReader = DirectoryReader.open(directory);
LuceneDictionary nameDictionary = new LuceneDictionary(directoryReader, NAME_FIELD);
IndexWriterConfig config = new IndexWriterConfig(new SimpleAnalyzer());
SpellChecker spellChecker = new SpellChecker(directory);
spellChecker.indexDictionary(nameDictionary, config, true);
for (String s : new String[]{"wafa", "waffa", "wala"}) {
String suggestions[] = spellChecker.suggestSimilar(s, 10, null, null, SuggestMode.SUGGEST_MORE_POPULAR);
System.out.println("Suggestions for " + s);
for (String suggestion : suggestions)
System.out.println(" -" + suggestion);
}
}
}
当我正在寻找Waffa
时,我不希望以下代码向我建议Wafa
(目录中发生了1000次!)