我在lucene中测试boost运算符并发现了奇怪的行为
示例
"red fox"
"red^1.2 fox"
当我针对文本测试查询时:
"精彩的红狐狸"
我得到的query2得分低于query1。但我希望query2应该获胜。
以下查询解释
解释query1
{0,4339554 = (MATCH) sum of:
0,2169777 = (MATCH) weight(content:fox in 0), product of:
0,7071068 = queryWeight(content:fox), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,304384 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
1 = tf(termFreq(content:fox)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
0,2169777 = (MATCH) weight(content:red in 0), product of:
0,7071068 = queryWeight(content:red), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,304384 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
1 = tf(termFreq(content:red)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
}
解释query2
{0,4313012 = (MATCH) sum of:
0,2396118 = (MATCH) weight(content:fox^1.25 in 0), product of:
0,7808688 = queryWeight(content:fox^1.25), product of:
1,25 = boost
0,3068528 = idf(docFreq=1, maxDocs=1)
2,035813 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
1 = tf(termFreq(content:fox)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
0,1916894 = (MATCH) weight(content:red in 0), product of:
0,6246951 = queryWeight(content:red), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,035813 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
1 = tf(termFreq(content:red)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
}
我想知道为什么提升查询的分数低于正常分数?
答案 0 :(得分:1)
这是由于查询规范。评分算法的这一特征试图使得从一个查询到下一个查询的分数大致相当。
计算方法如下:
queryNorm = 1 / sumOfSquaredWeights ½
其中:
sumOfSquaredWeights =查询提升 2 ·Σ(idf·term boost) 2
如果从解释中删除该因子,只需将最终得分除以查询范数,您就会发现第二个查询确实得到了更高的分数:
query1 - > .4339554 / 2.304384 = 0.1883
query2 - > .4313012 / 2.035813 = 0.2119
更重要的一点是:你不应该过多地比较一个查询与下一个查询的得分。分数只与生成它们的查询真正相关。您可以在解释中看到提升的术语对分数贡献了更大的相对权重,这是所有提升的真正意图。