from elasticsearch import Elasticsearch
from elasticsearch import helpers
es_url = '*****.us-east-1.es.amazonaws.com'
# es_conn = Elasticsearch(es_url)
es_conn = Elasticsearch([{'host': es_url, 'port': 443, 'use_ssl': True}])
while 1:
for ....:
actions.append(....)
if len(actions) >= 5000:
helpers.bulk(es_conn, actions)
actions = []
helpers.bulk(es_conn, actions)
上面的代码在ec2实例上运行,它经常抛出以下错误:
helpers.bulk(es_conn, actions)
File "/usr/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 194, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 91, in _process_bulk_chunk
raise e
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search-shinezoneels-pc3ib5rkhuylqynfoz6rph7gh4.us-east-1.es.amazonaws.com', port=443): Read timed out.)
同时,我在另一个EMR实例上运行代码,错误根本没发生。 ec2实例上的批量速度大约是EMR实例的两倍,但通常是错误的。怎么解决?