索引大型json时出现python elasticsearch超时错误

时间:2018-04-27 09:46:59

标签: python elasticsearch

我在索引巨大的json文本时遇到超时异常。

示例代码:

es.index(index="test", doc_type="test", body=jsonString) 

所以我尝试使用

增加超时
es.index(index="test", doc_type="test", body=jsonString, timeout=60) 

但这是解决问题的唯一方法吗?有时我的JSON字符串大小为40 MB到60 MB。

更新

我尝试使用批量索引,但失败了。

helpers.bulk(es, jsonOutput, index="test-las", doc_type="test-las")

日志:

Traceback (most recent call last):
  File "LasioParser.py", line 46, in <module>
    helpers.bulk(es, jsonOutput, index="test-las", doc_type="test-las")
  File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 257, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 192, in streaming_bulk
    raise_on_error, **kwargs)
  File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 137, in _process_bulk_chunk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: (u'500 document(s) failed to index.', [{

1 个答案:

答案 0 :(得分:1)

您是否使用http压缩?

from elasticsearch import Elasticsearch es = Elasticsearch(hosts, http_compress = True)

https://elasticsearch-py.readthedocs.io/en/master/#compression