我正在尝试使用https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex使用Elasticsearch python客户端重新索引。但我不断收到以下异常:elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout
错误的堆栈跟踪是
Traceback (most recent call last):
File "~/es_test.py", line 33, in <module>
main()
File "~/es_test.py", line 30, in main
target_index='users-2')
File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 306, in reindex
chunk_size=chunk_size, **kwargs)
File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 124, in streaming_bulk
raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout(HTTPSConnectionPool(host='myhost', port=9243): Read timed out. (read timeout=10))
除了增加超时之外,还有什么可以阻止此异常?
编辑: python代码
from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
es = Elasticsearch(connection_class=RequestsHttpConnection,
host='myhost',
port=9243,
http_auth=HTTPBasicAuth(username, password),
use_ssl=True,
verify_certs=True,
timeout=600)
helpers.reindex(es, source_index=old_index, target_index=new_index)
答案 0 :(得分:0)
由于Java堆空间的OutOfMemoryError,可能会发生这种情况,这意味着您没有为您要执行的操作提供弹性搜索足够的内存。 如果有任何异常,请尝试查看{{1}}。
答案 1 :(得分:0)
我已经遭受了这个问题几天,我将request_timeout参数更改为30(这是30秒)不起作用。 最后,我必须在elasticsearch.py
中编辑stream_bulk和reindex API将chunk_size参数从默认值500(处理500个文档)更改为每批次数较少的文档。我把我改成了50,这对我来说很好。不再读取超时错误。
def streaming_bulk(客户端,操作, chunk_size = 50 ,raise_on_error = True, expand_action_callback = expand_action,raise_on_exception = True, ** kwargs):
def reindex(client,source_index,target_index,query = None,target_client = None, chunk_size = 50 ,scroll ='5m',scan_kwargs = {},bulk_kwargs = {}):