如何在elasticsearch中使用扫描搜索类型的分页?

时间:2016-04-23 02:41:04

标签: elasticsearch

我使用此配置扫描我的索引

$client = ClientBuilder::create()->build();
$params = [
    "search_type" => "scan",    // use search_type=scan
    "scroll" => "30s",          // how long between scroll requests. should be small!
    "size" => 50,               // how many results *per shard* you want back
    "index" => "my_index",
    "body" => [
        "query" => [
            "match_all" => []
        ]
    ]
];

$docs = $client->search($params);   // Execute the search
$scroll_id = $docs['_scroll_id'];   // The response will contain no results, just a _scroll_id

// Now we loop until the scroll "cursors" are exhausted
while (\true) {

    // Execute a Scroll request
    $response = $client->scroll([
            "scroll_id" => $scroll_id,  //...using our previously obtained _scroll_id
            "scroll" => "30s"           // and the same timeout window
        ]
    );

    // Check to see if we got any search hits from the scroll
    if (count($response['hits']['hits']) > 0) {
        // If yes, Do Work Here

        // Get new scroll_id
        // Must always refresh your _scroll_id!  It can change sometimes
        $scroll_id = $response['_scroll_id'];
    } else {
        // No results, scroll cursor is empty.  You've exported all the data
        break;
    }
}

但我希望按ID对此扫描进行订购,并从10.000到11.000中选择结果(我的索引有33.000结果)。怎么办?

我在这个搜索主体中使用from和size但它不起作用。

1 个答案:

答案 0 :(得分:0)

使用扫描和滚动时,分页有效,但每页都会重复数据。所以只使用从

点击链接Here is the data for pagination

它仅在分页索引中检索10000条记录。在10,000个文档后,它显示错误。因此,在两个日期之间搜索数据以获得更好的结果。

示例:

$params['body']['query']['filtered']['filter']['bool']['must'][]['range']['meta.datetime']['gte'] = $startdate;
$params['body']['query']['filtered']['filter']['bool']['must'][]['range']['meta.datetime']['lte'] = $enddate;