我按照Lucene在此链接中获取文档的方式:
https://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/search/Similarity.html
但Lucene如何处理多个领域和场地提升?例如,如果我有两个字段:f1和f2及其相应的字段提升:b1和b2,最终得分是:
final score = b1*cosine_similarity(f1) + b2*cosine_similarity(f2)
提前致谢!
答案 0 :(得分:0)
余弦相似性是两个参数的函数。首先是查询,第二个是文档。
在您的情况下,您可能有一个布尔查询,其中包含这两个字段的子句。例如,“f1:text OR f2:text”。如果你看一下评分公式:
score(q, d) = coord(q,d) * queryNorm(q) * sum[tf(t in d) * idf(t)^2 * t.getBoost() * norm(t,d)]
你看到有一个元素规范(t,d)。此函数封装了增强和长度因子:
norm(t,d) = doc.getBoost() * lengthNorm * prod[f.getBoost()]
而f.getBoost()是相应字段的提升。
总结一下,它是上面查询的简化分数(假设两个术语的tf = 1和idf = 1/2,除了场增强之外的所有增强等于1):
score("f1: text OR f2: text", "{f1: '... text ...', f2: '... text ...'") = coordAndQueryNormPart * (1 * 1/4 * 1 * f1boost + 1 * 1/4 * 1 * f2boost)
UPD:
我写了一个可能有用的例子。这里的Lucene索引包含两个相同的文档,但在第一个文档“f1”字段中被提升:
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new StandardAnalyzer(Version.LUCENE_36));
IndexWriter writer = new IndexWriter(dir, config);
float field1Boost = 2.0f;
Document doc = new Document();
Field f1 = new Field("f1", "field text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f1.setBoost(field1Boost);
doc.add(f1);
doc.add(new Field("f2", "another text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field("f1", "field text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("f2", "another text", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
writer.commit();
writer.close();
IndexReader indexReader = IndexReader.open(dir);
IndexSearcher searcher = new IndexSearcher(indexReader);
QueryParser parser = new QueryParser(Version.LUCENE_36, "f1", new StandardAnalyzer(Version.LUCENE_36));
Query query = parser.parse("f1: text OR f2: text");
TopDocs docs = searcher.search(query, 2);
float score1 = docs.scoreDocs[0].score;
float score2 = docs.scoreDocs[1].score;
float score1check = 0.26274973154067993f * field1Boost + 0.26274973154067993f;
float score2check = 0.26274973154067993f + 0.26274973154067993f;
if (Math.abs(score1 - score1check) > 0.00001) throw new RuntimeException();
if (Math.abs(score2 - score2check) > 0.00001) throw new RuntimeException();
System.out.println("Score 1 = " + score1 + " ; score 2 = " + score2);