Lucene如何结合字段和提升值来生成分数?

时间:2014-02-06 18:13:55

标签: boost lucene scoring

我按照Lucene在此链接中获取文档的方式:

https://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/Similarity.html

但Lucene如何处理多个领域和场地提升?例如,如果我有两个字段:f1和f2及其相应的字段提升:b1和b2,最终得分是:

final score = b1*cosine_similarity(f1) + b2*cosine_similarity(f2)

提前致谢!

1 个答案:

答案 0 :(得分:0)

余弦相似性是两个参数的函数。首先是查询,第二个是文档。

在您的情况下,您可能有一个布尔查询,其中包含这两个字段的子句。例如,“f1:text OR f2:text”。如果你看一下评分公式:

score(q, d) = coord(q,d) * queryNorm(q) * sum[tf(t in d) * idf(t)^2 * t.getBoost() * norm(t,d)]

你看到有一个元素规范(t,d)。此函数封装了增强和长度因子:

norm(t,d) = doc.getBoost() * lengthNorm * prod[f.getBoost()]

而f.getBoost()是相应字段的提升。

总结一下,它是上面查询的简化分数(假设两个术语的tf = 1和idf = 1/2,除了场增强之外的所有增强等于1):

score("f1: text OR f2: text", "{f1: '... text ...', f2: '... text ...'") = coordAndQueryNormPart * (1 * 1/4 * 1 * f1boost + 1 * 1/4 * 1 * f2boost) 

UPD:

我写了一个可能有用的例子。这里的Lucene索引包含两个相同的文档,但在第一个文档“f1”字段中被提升:

    Directory dir = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new StandardAnalyzer(Version.LUCENE_36));
    IndexWriter writer = new IndexWriter(dir, config);

    float field1Boost = 2.0f;

    Document doc = new Document();
    Field f1 = new Field("f1", "field text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
    f1.setBoost(field1Boost);
    doc.add(f1);
    doc.add(new Field("f2", "another text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
    writer.addDocument(doc);


    doc = new Document();
    doc.add(new Field("f1", "field text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
    doc.add(new Field("f2", "another text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
    writer.addDocument(doc);
    writer.commit();

    writer.close();

    IndexReader indexReader = IndexReader.open(dir);

    IndexSearcher searcher = new IndexSearcher(indexReader);
    QueryParser parser = new QueryParser(Version.LUCENE_36, "f1", new StandardAnalyzer(Version.LUCENE_36));
    Query query = parser.parse("f1: text OR f2: text");
    TopDocs docs = searcher.search(query, 2);

    float score1 = docs.scoreDocs[0].score;
    float score2 = docs.scoreDocs[1].score;

    float score1check = 0.26274973154067993f * field1Boost + 0.26274973154067993f;
    float score2check = 0.26274973154067993f + 0.26274973154067993f;

    if (Math.abs(score1 - score1check) > 0.00001) throw new RuntimeException();
    if (Math.abs(score2 - score2check) > 0.00001) throw new RuntimeException();

    System.out.println("Score 1 = " + score1 + " ; score 2 = " + score2);