我在Solr 6中计算fieldLength值时遇到了一个问题。我使用BM25作为相似性度量。当我索引一组文档时,这些文档的fieldLength值非常错误。对于仅包含9个单词的标题字段,fieldLength字段存储值" 5.6493154E19"这是完全错误的。当我重新索引单个文档时,分数得到纠正,并显示fieldLength值为" 10.24"。 现在,当我重新索引整个语料库时,这些值再次被破坏,而且fieldLength值又是" 5.6493154E19"
存储原始字段长度值:
4.641637E-19 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
10.727212 = avgFieldLength
5.6493154E19 = fieldLength
重新索引单个文档后:
1.0189644 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
10.72807 = avgFieldLength
10.24 = fieldLength
重新索引整个语料库后:
4.641637E-19 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
10.727212 = avgFieldLength
5.6493154E19 = fieldLength
关于问题所在的任何想法?