分级评分Lucene,OR术语治疗

时间:2016-06-08 08:44:57

标签: java lucene information-retrieval scoring booleanquery

我试图将兴趣个人资料转换为一些Lucene查询。

给定标题术语和一些扩展术语,采用JSON格式,例如

{"title":"Donald Trump", "Expansion":[["republic","republican"],["democratic","democrat"],["campaign"]]}

相应的Lucene查询可以是如下的BooleanQuery(将标题术语提升因子设置为3.0,而将扩展项提升因子设置为1.0)。

+(text:donald^3.0 text:trump^3.0 (text:democrat text:democratic) (text:republic text:republican) text:campaign)

使用IndexSearcher's explain()方法,

匹配的文档,如

I know people just want to find a way to be famous without taking any risks, republic republican Donald Trump Campaign.

得分为9.0

3.0 = weight(text:donald^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  3.0 = weight(text:trump^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  2.0 = sum of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  1.0 = weight(text:campaign in 0) [TitleExpansionSimilarity], result of:
    1.0 = fieldWeight in 0, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      1.0 = idf(docFreq=201, maxDocs=201)
      1.0 = fieldNorm(doc=0)

有没有办法重写Lucene评分函数,为BooleanQuery(text:republic text:republican)aka得分。群集["republic","republican"]是“共和国”的匹配权重或“共和党”的匹配权重的最大值?

1.0 = MAX(instead of sum) of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)

1 个答案:

答案 0 :(得分:0)

不是通过Lucene的QueryParser语法,但您可以使用DisjunctionMaxQuery而不是BooleanQuery将查询和得分与其子查询的最高得分相结合,而不是子查询得分的总和