我们正在运行日期范围20000101到20070531的Lucene查询,但Lucene仅返回publicationDate介于20000101-20000701和20070101-20070531之间的文档。 Lucene跳过了好几年。运行不同的日期集时,结果类似。
完整插入代码:
Document doc = new Document();
doc.add(new Field("pageNumber", article.getPageNumber(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("publicationDate", 8, Field.Store.YES, true).setIntValue(Integer.parseInt(article.getPublicationDate())));
doc.add(new Field("headline", article.getHeadline(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("text", article.getText(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("fileName", article.getFileName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaType", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaSource", article.getMediaSource(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("overLap", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("status", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);
文件计数代码:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
IndexReader reader = IndexReader.open(index);
Query sourceQuery = new TermQuery(new Term("mediaSource", source));
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
Query textQuery = queryParser.parse(terms);
Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(booleanQuery, collector);
System.out.println("start: " + startDate);
System.out.println("end: " + endDate);
System.out.println("total: " + collector.getTotalHits());
String hitCount = String.valueOf(collector.getTotalHits());
searcher.close();
reader.close();
analyzer.close();
return hitCount;
完整文件清单:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
IndexReader reader = IndexReader.open(index);
Query sourceQuery = new TermQuery(new Term("mediaSource", source));
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
Query textQuery = queryParser.parse(terms);
Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(booleanQuery, collector);
Sort sort = new Sort(new SortField("publicationDate", SortField.INT));
if (collector.getTotalHits() > 0) {
TopDocs topDocs = searcher.search(booleanQuery, collector.getTotalHits(), sort);
int i = 0;
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
ArrayList<String> resultRow = new ArrayList<String>();
Document doc = searcher.doc(scoreDoc.doc);
resultRow.add(String.valueOf(i));
resultRow.add(doc.get("publicationDate"));
resultRow.add(doc.get("mediaSource"));
resultRow.add(doc.get("fileName"));
resultRow.add(doc.get("headline"));
resultRow.add(doc.get("pageNumber"));
ql.results.put(String.valueOf(i), resultRow);
i++;
}
} else {
ArrayList<String> resultRow = new ArrayList<String>();
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
ql.results.put("0", resultRow);
}
截断结果(2058份文件的最后10篇):
20021231 Iraq Belongs on the Back Burner 20021231 With Missionaries Spreading, Muslims' Anger Is Following 20021231 WHITE HOUSE CUTS ESTIMATE OF COST OF WAR WITH IRAQ 20021231 Bring Back the Draft 20040101 Pakistani Leader's New Tactic: Persuasion 20040101 What We Will Do in 2004 20040101 Ethnic Morass Bogs Down Afghan Talks On Charter 20040101 U.S. Hunts Terror Clues in Case of 2 Brothers 20040101 Giving Up Those Weapons: After Libya, Who Is Next? 20040101 An Odd Sight in Iran as Aid Team Tents Go Up: The U.S. Flag
答案 0 :(得分:-1)
问题是NumericRangeQueries无法正常工作。使用带有字符串值的RangeQuery可以解决问题。