我正在尝试滚动我的ES索引并获取所有文档,但看起来我一直缺少初始滚动返回的第一组文档。例如,如果我的滚动大小为10并且我的查询在滚动后返回总计100,那么我将只有90个文档。关于我缺少什么的任何建议?
以下是我目前的尝试:
$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';
$params = [
"scroll" => "1m",
"size" => 50,
"index" => "myindex",
"type" => "mytype",
"body" => $json
];
$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];
print " total results: " . $scroll_size;
//scroll
$count = 0;
while ($scroll_size > 0) {
print " SCROLLING...";
$scroll_results = $client->scroll([
'scroll_id' => $s_id,
'scroll' => '1m'
]);
// get number of results returned in the last scroll
$scroll_size = sizeof($scroll_results['hits']['hits']);
print " scroll size: " . $scroll_size;
// do something with results
for ($i=0; $i<$scroll_size; $i++) {
$count++;
}
}
print " total id count: " . $id_count;
答案 0 :(得分:3)
您执行的第一个查询返回文档数,也返回文档。第一个查询是建立滚动并获取第一组文档。处理完第一组结果后,可以使用scroll_id获取下一页,依此类推。
答案 1 :(得分:0)
谢谢@Ramdev。是的,我意识到经过一番挖掘。对于其他任何人这里最终为我工作的是:
$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';
$count = 0;
$params = [
"scroll" => "1m",
"size" => 50,
"index" => "myindex",
"type" => "mytype",
"body" => $json
];
$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];
print " total results: " . $scroll_size;
// first set of scroll results
for ($i=0; $i<$size; $i++) {
$count++;
}
//scroll
while ($scroll_size > 0) {
print " SCROLLING...";
$scroll_results = $client->scroll([
'scroll_id' => $s_id,
'scroll' => '1m'
]);
// get number of results returned in the last scroll
$scroll_size = sizeof($scroll_results['hits']['hits']);
print " scroll size: " . $scroll_size;
// do something with results
for ($i=0; $i<$scroll_size; $i++) {
$count++;
}
}
print " total id count: " . $id_count;