Lucene在NumericRangeQuery日期上跳过了几年

时间:2012-07-14 00:04:06

标签: java lucene

我们正在运行日期范围20000101到20070531的Lucene查询,但Lucene仅返回publicationDate介于20000101-20000701和20070101-20070531之间的文档。 Lucene跳过了好几年。运行不同的日期集时,结果类似。

完整插入代码:

Document doc = new Document();
doc.add(new Field("pageNumber", article.getPageNumber(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("publicationDate", 8, Field.Store.YES, true).setIntValue(Integer.parseInt(article.getPublicationDate())));
doc.add(new Field("headline", article.getHeadline(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("text", article.getText(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("fileName", article.getFileName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaType", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaSource", article.getMediaSource(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("overLap", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("status", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);                       

文件计数代码:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);

    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    System.out.println("start: " + startDate);
    System.out.println("end: " + endDate);
    System.out.println("total: " + collector.getTotalHits());

    String hitCount = String.valueOf(collector.getTotalHits());
    searcher.close();
    reader.close();
    analyzer.close();
    return hitCount;

完整文件清单:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);
    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    Sort sort = new Sort(new SortField("publicationDate", SortField.INT));

    if (collector.getTotalHits() > 0) {
        TopDocs topDocs = searcher.search(booleanQuery, collector.getTotalHits(), sort);

        int i = 0;
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            ArrayList<String> resultRow = new ArrayList<String>();
            Document doc = searcher.doc(scoreDoc.doc);
            resultRow.add(String.valueOf(i));
            resultRow.add(doc.get("publicationDate"));
            resultRow.add(doc.get("mediaSource"));
            resultRow.add(doc.get("fileName"));
            resultRow.add(doc.get("headline"));
            resultRow.add(doc.get("pageNumber"));
            ql.results.put(String.valueOf(i), resultRow);
            i++;
        }
    } else {
        ArrayList<String> resultRow = new ArrayList<String>();
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        ql.results.put("0", resultRow);
    }

截断结果(2058份文件的最后10篇):

20021231   Iraq Belongs on the Back Burner
20021231    With Missionaries Spreading, Muslims' Anger Is Following
20021231    WHITE HOUSE CUTS ESTIMATE OF COST OF WAR WITH IRAQ
20021231    Bring Back the Draft
20040101    Pakistani Leader's New Tactic: Persuasion
20040101    What We Will Do in 2004
20040101    Ethnic Morass Bogs Down Afghan Talks On Charter
20040101    U.S. Hunts Terror Clues in Case of 2 Brothers
20040101    Giving Up Those Weapons: After Libya, Who Is Next?
20040101    An Odd Sight in Iran as Aid Team Tents Go Up: The U.S. Flag

1 个答案:

答案 0 :(得分:-1)

问题是NumericRangeQueries无法正常工作。使用带有字符串值的RangeQuery可以解决问题。