Question

我有一个有60k元素的弹性搜索索引。我知道，通过查看head插件，我会通过Sense获得相同的信息（结果位于右下角）

enter image description here

然后我想以两种不同的方式从Python查询相同的索引：通过直接requests调用并使用elasticsearch模块：

import elasticsearch
import json
import requests

# the requests version
data = {"query": {"match_all": {}}}
r = requests.get('http://elk.example.com:9200/nessus_current/_search', data=json.dumps(data))
print(len(r.json()['hits']['hits']))

# the elasticsearch module version
es = elasticsearch.Elasticsearch(hosts='elk.example.com')
res = es.search(index="nessus_current", body={"query": {"match_all": {}}})
print(len(res['hits']['hits']))

在这两种情况下，结果都是10 - 远远低于预期的60k。查询的结果是有意义的（内容是我所期望的），只是它们中只有少数。

我从这10个匹配中选择了一个并用Sense查询其_id来关闭循环。正如预期的那样，确实发现了：

enter image description here

所以看起来10个命中是整个索引的一个子集，为什么在Python版本的调用中没有报告所有元素？

Answer 1

10 is the default size of the results returned by Elasticsearch。如果您想要更多，请指定"size": 100。但是，请注意，不建议使用大小返回所有文档，因为它可能会导致群集崩溃。要获取所有结果，请使用scan&scroll。

我认为应该res['hits']['total']而不是res['hits']['hits']才能获得总点击次数。

为什么报告的elasticsearch命中数根据查询方法的不同而不同？

1 个答案: