Question

我将使用Elastic Search查询在两个日期之间获取一些记录。

首先，我检查两个日期之间的记录数，以了解它是否大于10000。如果是这样，我尝试通过10000来获取它们。

    //get count
    var result_count = client.Count<TelegramMessageStructure>(s => s
    .AllTypes()
    .AllIndices()
    .Query(q => q
    .DateRange(r => r
    .Field(f => f.messageDate)
    .GreaterThanOrEquals("2018-06-03 00:00:00.000")
    .LessThan("2018-06-03 00:59:00.000")
    )
    )
    );
    long count = result_count.Count; //count = 27000

它返回27000。所以我想通过10000来获取它们10000。我使用此查询来做到这一点：

    int MaxMessageCountPerQuery=10000;

    for (int i = 0; i < count; i += MaxMessageCountPerQuery)
    {
        client = new ElasticClient(connectionSettings);
        // No change whether the client is renewed or not
        var result = client.Search<TelegramMessageStructure>(s => s
           .AllTypes()
           .AllIndices()
           .MatchAll()
           .From(i)
           .Size(MaxMessageCountPerQuery)
           .Sort(ss => ss.Ascending(p => p.id))
           .Query(q => q
               .DateRange(r => r
                   .Field(f => f.messageDate)
                   .GreaterThanOrEquals("2018-06-03 00:00:00.000")
                   .LessThan("2018-06-03 00:59:00.000")
               )
           )
       );
       //when i=0, result.documents contains 10000 records otherwise it has 0

    }

在第一轮中，当i = 0时，result.documents包含10000条记录，否则包含0条记录。

这有什么问题？

Answer 1

基于此链接： scroll in elastic net-api

您的代码应包含以下步骤：

1-搜索所需的所有参数以及 .Scroll（“ 5m”）（我假设 from（0）和 size（10000）< / em>也已设置并将响应保存在 result 变量中）

2-现在您有前10000条记录（在 result.Documents 中）

3-要接收更多记录，应使用 ScrollId 参数以获取更多结果。（每次调用下面的代码都会给您下10000条记录）

var result_new = client.Scroll<TelegramMessageStructure>("10m", result.ScrollId);

实际上，您的代码应如下所示：

int MaxMessageCountPerQuery=10000; client = new ElasticClient(connectionSettings); // No change whether the client is renewed or not var result = client.Search<TelegramMessageStructure>(s => s .AllTypes() .AllIndices() .MatchAll() .From(i) .Size(MaxMessageCountPerQuery) .Sort(ss => ss.Ascending(p => p.id)) .Query(q => q .DateRange(r => r .Field(f => f.messageDate) .GreaterThanOrEquals("2018-06-03 00:00:00.000") .LessThan("2018-06-03 00:59:00.000") ) ) .Scroll("5m") // Add this parameter ); // TODO some code: // save and use result.Documents for (int i = 0; i < result.Total; i += MaxMessageCountPerQuery) { var result_new = client.Scroll<TelegramMessageStructure>("10m", result.ScrollId); // Add this line to loop , Each loop you can get next 10000 record. // TODO some code: // save and use result_new.Documents }

Answer 2

Elasticsearch的默认索引为max.result_window = 10000，其详细解释如下 https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

要了解为什么深度分页会带来问题，让我们想象一下   正在使用五个主要分片在单个索引中搜索。什么时候我们   要求结果的第一页（结果1至10），每个分片   产生自己的前10个结果并将其返回给协调   节点，然后对所有50个结果进行排序，以选择整体   前10名。

现在想象一下，我们要求提供1000页-结果为10,001至10,010。   一切工作方式相同，只是每个分片必须产生   其前10010个结果。然后，协调节点将所有   50,050个结果，并丢弃其中的50,040个！

您可以看到，在分布式系统中，排序结果的成本   我们的页面越深，指数增长就越大。有充分的理由   网络搜索引擎不会为任何查询返回超过1,000个结果。

弹性搜索，从头开始无法正常运行

2 个答案: