Lucene .Net,自定义评分

时间:2017-08-03 13:29:32

标签: .net lucene

我有以下Lucene说明:

{1.25 = (MATCH) sum of:

  0.5 = (MATCH) weight(Caption:vrom^0.5 in 0) [MySimilarity], result of:
    0.5 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.5 = queryWeight, product of:
        0.5 = boost
        1 = idf(docFreq=1, maxDocs=4)
        1 = queryNorm
      1 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1 = idf(docFreq=1, maxDocs=4)
        1 = fieldNorm(doc=0)

  0.75 = (MATCH) weight(Caption:vroma^0.75 in 0) [MySimilarity], result of:
    0.75 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.75 = queryWeight, product of:
        0.75 = boost
        1 = idf(docFreq=1, maxDocs=4)
        1 = queryNorm
      1 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1 = idf(docFreq=1, maxDocs=4)
        1 = fieldNorm(doc=0)
}

我希望通过查询权重过滤匹配结果作为MAX匹配,而不是匹配的总和。

我需要做的是,从每个文件中,我想采用每个条款中给出的最高数字。 (在这个例子中,我想取0.75作为匹配分数而不是1.25)。这可能,甚至是正确的吗?

到目前为止,我所做的是创建一个相似度以改变分数的计算方式,但我仍然缺少获得MAX而不是SUM的部分。

我正在使用Lucene .Net版本4.8(测试版)。

提前谢谢!

2 个答案:

答案 0 :(得分:0)

无需修改相似性即可。使用DisjunctionMaxQuery

代替布尔查询

答案 1 :(得分:0)

感谢您的解决方案,但我仍有问题。 当我尝试这样做时,我得到与以前相同的结果。我的代码如下(我使用了here中的示例):

BooleanQuery finalQuery = new BooleanQuery();
DisjunctionMaxQuery q1 = new DisjunctionMaxQuery(0.01f);
Query query = new FuzzyQuery(new Term("Caption", "roma"));
q1.Add(query);
finalQuery.Add(q1, Occur.MUST);

我得到的结果(与问题中的相同):

{1.25 = (MATCH) sum of:

  0.5 = (MATCH) weight(Caption:vrom^0.5 in 0) [MySimilarity], result of:
    0.5 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.5 = queryWeight, product of:
        0.5 = boost
        1 = idf(docFreq=1, maxDocs=4)
        1 = queryNorm
      1 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1 = idf(docFreq=1, maxDocs=4)
        1 = fieldNorm(doc=0)

  0.75 = (MATCH) weight(Caption:vroma^0.75 in 0) [MySimilarity], result of:
    0.75 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.75 = queryWeight, product of:
        0.75 = boost
        1 = idf(docFreq=1, maxDocs=4)
        1 = queryNorm
      1 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1 = idf(docFreq=1, maxDocs=4)
        1 = fieldNorm(doc=0)
}

我尝试了它而没有修改相似性,但我有相同的。在这种情况下的结果是:

{0.505973 = (MATCH) sum of:

  0.2023892 = (MATCH) weight(Caption:vrom^0.5 in 0) [DefaultSimilarity], result of:
    0.2023892 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.3187582 = queryWeight, product of:
        0.5 = boost
        1.693147 = idf(docFreq=1, maxDocs=4)
        0.3765274 = queryNorm
      0.6349302 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1.693147 = idf(docFreq=1, maxDocs=4)
        0.375 = fieldNorm(doc=0)

  0.3035838 = (MATCH) weight(Caption:vroma^0.75 in 0) [DefaultSimilarity], result of:
    0.3035838 = score(doc=0,freq=1 = termFreq=1
), product of:
      0.4781373 = queryWeight, product of:
        0.75 = boost
        1.693147 = idf(docFreq=1, maxDocs=4)
        0.3765274 = queryNorm
      0.6349302 = fieldWeight in 0, product of:
        1 = tf(freq=1), with freq of:
          1 = termFreq=1
        1.693147 = idf(docFreq=1, maxDocs=4)
        0.375 = fieldNorm(doc=0)
}