我正在使用java代码实现lucene。我搜索过一个短语,例如使用Index目录中的ShingleFilter (TokenStream input, int minShingleSize, int maxShingleSize)
进行“软件工程,软件开发”。它运作良好。输出是:
Phrase Searching:software engineering software
Found 5 hits.
1. Index Document ID:336 File Name: jucs_243.pdf.txt
2. Index Document ID:506 File Name: jucs_4.pdf.txt
3. Index Document ID:524 File Name: jucs_419.pdf.txt
4. Index Document ID:276 File Name: jucs_189.pdf.txt
5. Index Document ID:340 File Name: jucs_247.pdf.txt
Phrase Searching:software engineering software development
Found 1 hits.
1. Index Document ID:506 File Name: jucs_4.pdf.txt
Phrase Searching:engineering software development
Found 1 hits.
1. Index Document ID:506 File Name: jucs_4.pdf.txt
我的问题是:单个文件在Java中出现多少次?我的代码是:
// display search results
TopDocs topDocs = searcher.search(query, LuceneConstants.MAX_SEARCH);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
// print some info about where the hit was found...
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " +"Index Document ID:"+ docId + "File Name:" + d.get(LuceneConstants.FILE_PATH));
}
答案 0 :(得分:0)
我已经使用
完成了这项任务 static TreeMap<Integer, Integer> Total_Hits = new TreeMap<Integer, Integer>();
我的代码是:
for(int i=0;i<hits.length;++i){
int docId = hits[i].doc;
if(Total_Hits.keySet().isEmpty() == true)
{
Total_Hits.put(docId,1);
}
else
{
if(Total_Hits.containsKey(docId))
{Total_Hits.put(docId,Total_Hits.get(docId).intValue()+1);}
else
{ Total_Hits.put(docId,1);}
}
输出:
Document ID:276 No oF Hits : 1 Time
Document ID:336 No oF Hits : 1 Time
Document ID:340 No oF Hits : 1 Time
Document ID:506 No oF Hits : 3 Time
Document ID:524 No oF Hits : 1 Time