似乎Apache Lucene api从每个版本都有所改变。如何从Apache lucene 6.4.0的IndexReader中获得最常用的术语。
我看到Get highest frequency terms from Lucene index对Apache Lucene 6.4.0无用。
答案 0 :(得分:1)
这是适用于Lucene 6.4的代码。它找到所有字段中最常用的术语,用于分别在字段调整代码中查找最常用的术语。
IndexReader reader = DirectoryReader.open(dir);
final Fields fields = MultiFields.getFields(reader);
final Iterator<String> iterator = fields.iterator();
long maxFreq = Long.MIN_VALUE;
String freqTerm = "";
while(iterator.hasNext()) {
final String field = iterator.next();
final Terms terms = MultiFields.getTerms(reader, field);
final TermsEnum it = terms.iterator();
BytesRef term = it.next();
while (term != null) {
final long freq = it.totalTermFreq();
if (freq > maxFreq) {
maxFreq = freq;
freqTerm = term.utf8ToString();
}
term = it.next();
}
}
System.out.println(freqTerm + " " + maxFreq);