Question

我正在使用Lucene的TestRegexpQuery单元测试，一切正常，但是当我添加一些额外的print语句时，我不太明白为什么它不会返回文档本身。

  private int regexQueryNrHits(String regex) throws IOException {
    // RegexpQuery query = new RegexpQuery(newTerm(regex));
    // return searcher.search(query, 5).totalHits;
    RegexpQuery query = new RegexpQuery(newTerm(regex));
    TopDocs result = searcher.search(query, 5);

    // my code to print the result instead of just the counts
    //START
    ScoreDoc[] docs = result.scoreDocs;
    for (ScoreDoc scoreDoc : docs) {
      System.out.println(scoreDoc);
      System.out.println(scoreDoc.doc);
      System.out.println(scoreDoc.score);
      System.out.println(scoreDoc.shardIndex);
      System.out.println(searcher.getIndexReader().document(scoreDoc.doc));
    }
    System.out.println("---------");
    // end
    return result.totalHits;
  }

此测试仅插入一个文档，这是结果的样子，我希望它返回与正则表达式匹配的句子或标记，但所有内容看起来都是空文档。

---------
doc=0 score=1.0 shardIndex=0
0
1.0
0
Document<>
---------
doc=0 score=1.0 shardIndex=0
0
1.0
0
Document<>

任何人都可以帮助我理解这里结果的真正含义吗？

Answer 1

您需要存储该字段才能检索它。可以搜索索引的，未存储的字段，但不会返回结果。许多字段构造函数接受一个参数来指定是否应该存储它：

doc.add(new TextField("mytext", "some text", Field.Store.YES));

Answer 2

您的问题是关于lucene文档的一个“空”的实例。

在您的情况下为空意味着toString()-Method会返回Document<>。

这意味着fields-List为空。所以大多数情况下你没有存储的字段。

Lucene返回空正则表达式搜索结果

2 个答案: