当我使用Python Elasticsearch API查询Elasticsearch时,我得到了大约5000个结果。设置"大小"搜索查询中的参数数量大于结果数量导致以下Java OOM错误:
File "MGDFinder.py", line 114, in <module>
res = es.search(index="_all", body=queryMaker(state))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 440, in search
params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 276, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 97, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'OutOfMemoryError[Java heap space]')
我注意到,当大小设置为700时,会发生这种情况。我不想增加Java堆大小。有没有办法可以批量执行500次搜索?
答案 0 :(得分:0)
我不认为您可以批量请求而不增加Java Heap Space
,服务器仍然会存储5000个结果并返回。
我认为您可以使用scroll
来获取请求,scroll
可以从大量结果中快速检索,它喜欢传统数据库中的cursor
。
样品申请:
$ curl -XGET 'localhost:9200/world/test/_search?scroll=1m&pretty' -d '
{
"size": 50,
"query": {
"match_all": {}
}
}'
样本回复:
{
"_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTszNjpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM3Oldlb2VGcldIUi1PRTZhS0gzWE5rQUE7Mzg6V2VvZUZyV0hSLU9FNmFLSDNYTmtBQTs0MDpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM5Oldlb2VGcldIUi1PRT
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {....
结果将返回一个滚动ID,它可用于获取下一个匹配。
示例scroll
请求(-d
_scroll_id):
$ curl -XGET 'localhost:9200/_search/scroll?scroll=1m&pretty' -d 'cXVlcnlUaGVuRmV0Y2g7NTszMTpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzMyOldlb2VGcldIUi1PRTZhS0gzWE5rQUE7MzM6V2VvZUZyV0hSLU9FNmFLSDNYT
mtBQTszNDpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM1Oldlb2VGcldIUi1PRTZhS0gzWE5rQUE7MDs='
官方文件:Scroll