我正在用Lucene_35搜索字段。我想从我的术语中得到多少单词匹配该字段。 例如,我的字段是“JavaServer Faces(JSF)是一个基于Java的Web应用程序框架,旨在简化基于Web的用户界面的开发集成。”,我的查询术语是“java / jsf / framework / doesnotexist“并且想要结果3,因为只有”java“,”jsf“和”framework“存在在该领域。 这是我正在遵循的一个简单示例:
public void explain(String document, String queryExpr) throws Exception {
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, document);
w.close();
String queryExpression = queryExpr;
Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(queryExpression);
System.out.println("Query: " + queryExpression);
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs topDocs = searcher.search(q, 10);
for (int i = 0; i < topDocs.totalHits; i++) {
ScoreDoc match = topDocs.scoreDocs[i];
System.out.println("match.score: " + match.score);
Explanation explanation = searcher.explain(q, match.doc); //#1
System.out.println("----------");
Document doc = searcher.doc(match.doc);
System.out.println(doc.get("title"));
System.out.println(explanation.toString());
}
searcher.close();
}
带有上述参数的输出是:
0.021505041 = (MATCH) product of:
0.028673388 = (MATCH) sum of:
0.0064956956 = (MATCH) weight(title:java in 0), product of:
0.2709602 = queryWeight(title:java), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
0.8830299 = queryNorm
...
0.033902764 = (MATCH) fieldWeight(title:framework in 0), product of:
1.4142135 = tf(termFreq(title:framework)=2)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.078125 = fieldNorm(field=title, doc=0)
0.75 = coord(3/4)
我希望得到这个3/4。
问候!
答案 0 :(得分:7)
您可以通过使用以下方法定义覆盖Lucene的DefaultSimilarity来实现此目的:
这样,文档的最终得分就是coor因子(1 / maxOverlap)乘以匹配项的数量。
Directory dir = new RAMDirectory();
Similarity similarity = new DefaultSimilarity() {
@Override
public float computeNorm(String fld, FieldInvertState state) {
return state.getBoost();
}
@Override
public float coord(int overlap, int maxOverlap) {
return 1f / maxOverlap;
}
@Override
public float idf(int docFreq, int numDocs) {
return 1f;
}
@Override
public float queryNorm(float sumOfSquaredWeights) {
return 1f;
}
@Override
public float tf(float freq) {
return freq == 0f ? 0f : 1f;
}
};
IndexWriterConfig iwConf = new IndexWriterConfig(Version.LUCENE_35,
new WhitespaceAnalyzer(Version.LUCENE_35));
iwConf.setSimilarity(similarity);
IndexWriter iw = new IndexWriter(dir, iwConf);
Document doc = new Document();
Field field = new Field("text", "", Store.YES, Index.ANALYZED);
doc.add(field);
for (String value : Arrays.asList("a b c", "c d", "a b d", "a c d")) {
field.setValue(value);
iw.addDocument(doc);
}
iw.commit();
iw.close();
IndexReader ir = IndexReader.open(dir);
IndexSearcher searcher = new IndexSearcher(ir);
searcher.setSimilarity(similarity);
BooleanQuery q = new BooleanQuery();
q.add(new TermQuery(new Term("text", "a")), Occur.SHOULD);
q.add(new TermQuery(new Term("text", "b")), Occur.SHOULD);
q.add(new TermQuery(new Term("text", "d")), Occur.SHOULD);
TopDocs topDocs = searcher.search(q, 100);
System.out.println(topDocs.totalHits + " results");
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (int i = 0; i < scoreDocs.length; ++i) {
int docId = scoreDocs[i].doc;
float score = scoreDocs[i].score;
System.out.println(ir.document(docId).get("text") + " -> " + score);
System.out.println(searcher.explain(q, docId));
}
ir.close();