我正在尝试通过错别字错误对文档进行简单查询。
我已使用BooleanQuery将其存档,并为每个字段添加了FuzzyQuery。但是现在我的问题是得分。
据我了解,评分是根据每个字段的所有FuzzyQuery的总和来计算的。但是我只想保留最好的一个,而不要做这个加法。
例如,如果我搜索“ manticore”,我想从字段名称中获得最佳结果。
查询
SearchEngine(final String indexStoragePath) throws IOException {
MMapDirectory mMapDirectory = new MMapDirectory(Paths.get(indexStoragePath));
searcherManager = new SearcherManager(mMapDirectory, new SearcherFactory());
indexSearcher = searcherManager.acquire();
}
List<String> getMatchingIds(final String searchQuery, int maxSearch) throws IOException {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Map.Entry<String, Integer> input : FIELDS.entrySet()) {
FuzzyQuery query = new FuzzyQuery(new Term(input.getKey(), searchQuery), input.getValue());
builder.add(query, BooleanClause.Occur.SHOULD);
}
BooleanQuery resultQuery = builder.build();
TopDocs docs = indexSearcher.search(resultQuery, maxSearch);
ScoreDoc[] hits = docs.scoreDocs;
List<String> result = Lists.newArrayList();
for (ScoreDoc hit : hits) {
Document document = getDocument(hit);
result.add(document.get(ID));
}
return result;
}
已编入索引的
Indexer(final String indexStoragePath) throws IOException {
MMapDirectory mMapDirectory = new MMapDirectory(Paths.get(indexStoragePath));
StandardAnalyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
config.setSimilarity(new BooleanSimilarity());
writer = new IndexWriter(mMapDirectory, config);
writer.commit();
}
如何修改此评分系统?