在Lucene / Lucene.net搜索中,如何计算每个文档的点击次数?

时间:2010-02-12 02:46:26

标签: lucene lucene.net

搜索大量文档时,我可以轻松找到符合我搜索条件的文档数量:

Hits hits = Searcher.Search(query);
int DocumentCount = hits.Length();

如何确定文档中的总点击次数?例如,假设我搜索“congress”,我会收到2份文件。如何获得每个文档中“会议”的次数?例如,假设“会议”在文档#1中出现2次,在文档#2中出现3次。我正在寻找的结果是 5

2 个答案:

答案 0 :(得分:6)

这是Lucene Java,但应该适用于Lucene.NET:

List docIds = // doc ids for documents that matched the query, 
              // sorted in ascending order 

int totalFreq = 0;
TermDocs termDocs = reader.termDocs();
termDocs.seek(new Term("my_field", "congress"));
for (int id : docIds) {
    termDocs.skipTo(id);
    totalFreq += termDocs.freq();
}

答案 1 :(得分:0)

这也是Lucene Java。如果您的查询/搜索条件可以写为SpanQuery,那么您可以执行以下操作:

IndexReader indexReader = // define your index reader here
SpanQuery spanQuery = // define your span query here
Spans spans = spanQuery.getSpans(indexReader);
int occurrenceCount = 0;
while (spans.next()) {
    occurrenceCount++;
}
// now occurrenceCount contains the total number of occurrences of the word/phrase/etc across all documents in the index