Question

我对lucene指数很新，所以如果我想做的事情是微不足道的，我会事先道歉。我有一个索引，其中文档包含（以及其他）两个字段：

documentoId和employeeId。

每位员工都可以提交各种文件。结构与bookstore示例中的结构几乎相同我想要实现的是获取与查询匹配的所有最新文档，这意味着每个documentoId的{{1}}最高。

在SQL中，这可能是这样的：
employeeId

我不知道我是否应该使用facet API，或者是否可以使用查询或searchAfter方法...我很遗憾地使用了文档。

任何帮助将不胜感激！感谢

Answer 1

自定义排序命中将起到作用。 Google是Lucene中的search.sort参数。

Answer 2

Lucene支持分组搜索;您需要做的是定义您的组以及如何对其进行排序。在下面的示例中，我按documentoId分组并按降序排序。

public static void main(String[] args) throws IOException, ParseException {
    StandardAnalyzer standardAnalyzer = new StandardAnalyzer(Version.LUCENE_46);
    RAMDirectory ramDirectory = new RAMDirectory();

    IndexWriter indexWriter = new IndexWriter(ramDirectory, new IndexWriterConfig(Version.LUCENE_46, standardAnalyzer));

    Document d0 = new Document();
    d0.add(new TextField("employeeId", "foo", Field.Store.YES));
    d0.add(new IntField("documentoId", 1, Field.Store.YES));
    indexWriter.addDocument(d0);

    Document d1 = new Document();
    d1.add(new TextField("employeeId", "bar", Field.Store.YES));
    d1.add(new IntField("documentoId", 20, Field.Store.YES));
    indexWriter.addDocument(d1);

    Document d2 = new Document();
    d2.add(new TextField("employeeId", "baz", Field.Store.YES));
    d2.add(new IntField("documentoId", 3, Field.Store.YES));
    indexWriter.addDocument(d2);

    indexWriter.commit();

    GroupingSearch groupingSearch = new GroupingSearch("documentoId");
    Sort groupSort = new Sort(new SortField("documentoId", SortField.Type.INT, true));  // in descending order
    groupingSearch.setGroupSort(groupSort);
    groupingSearch.setSortWithinGroup(groupSort);

    IndexReader reader = DirectoryReader.open(ramDirectory);
    IndexSearcher searcher = new IndexSearcher(reader);

    TopGroups<?> groups = groupingSearch.search(searcher, new MatchAllDocsQuery(), 0, 10);

    Document highestScoredDocument = reader.document(groups.groups[0].scoreDocs[0].doc);
    System.out.println(
            "Descending order, first document is " +
                    "employeeId:" + highestScoredDocument.get("employeeId") + " " +
                    "documentoId:" + highestScoredDocument.get("documentoId")
    );
}

上面的代码检测到顶部的d1（中间文档）得分并打印出以下内容：

降序，第一个文件是employeeId：bar documentoId：20

以上代码未涉及content like 'mySearchValue'部分，您必须使用相关查询替换MatchAllDocsQuery才能执行此操作。

Answer 3

对于那些处于相同情况的人，我使用mindas注释解决了我的问题并修改它以使用我的组字段：

GroupingSearch groupingSearch = new GroupingSearch("employeeId");
Sort groupSort = new Sort(new SortField("documentoId", SortField.Type.INT, true));  // in descending order
groupingSearch.setGroupSort(groupSort);
groupingSearch.setSortWithinGroup(groupSort);


int offset = 0;
int limitGroup = 50;
TopGroups<?> groups = groupingSearch.search(is,query, offset, limitGroup);

List<Document> result = new ArrayList();
for (int i=0; i<groups.groups.length; i++) {
    ScoreDoc sdoc = groups.groups[i].scoreDocs[0]; // first result of each group
    Document d = is.doc(sdoc.doc);
    result.add(d);
 }

Lucene：获取类别的最新文档

3 个答案: