Question

我有两个文档包含：

doc_1 ：one two three four five Bingo

doc_2 ：Bingo one two three four five

我在两个字段中编制索引，其中一个字段包含前五个字词，第二个字段包含最后一个字词。

TextField start_field = new TextField("start_words", content.substring(0, index), Field.Store.NO);
TextField end_field = new TextField("end_words", content.substring(index,content.length()-1, Field.Store.NO);
// index is index value of 5th ' '

为了更好地看到提升结果，我实现了以下相似性：

DefaultSimilarity customSimilarity = new DefaultSimilarity() {
     @Override
     public float lengthNorm(FieldInvertState state) {
         return 1; // So length of each field would not matter
     }
};

在不应用任何提升的情况下，搜索Bingo会导致两个文档具有相同的分数（按预期和预期）。但是，在对其中一个字段（start_field.setBoost(5)）应用提升时，虽然 doc_2 包含Bingo的字段已被提升，但两个分数仍保持相同。

如果我删除customSimilarity，则会按预期进行提升。

为什么boosting被lengthNorm停止了？如何使用给定的覆盖相似度进行提升工作？

Answer 1

lengthNorm()中DefaultSimilarity的{{3}}为state.getBoost() * lengthNorm(numTerms)。

在您的实施中，您没有考虑提升。为了使您的提升更重要，您可以让您的实现返回state.getBoost()。

lucene助推如何受长度相似性的影响

1 个答案: