如何加快ElasticSearch索引?

时间:2017-03-12 18:50:32

标签: python python-3.x elasticsearch

我是弹性搜索的初学者,我必须使用python脚本将100万个随机事件写入弹性搜索集群(托管在云端)...

es = Elasticsearch(
    [host_name],
    port=9243,
    http_auth=("*****","*******"),
    use_ssl=True,
    verify_certs=True,
    ca_certs=certifi.where(),
    sniff_on_start=True
)

这是索引的代码:

for i in range(1000000):

src_centers=['data center a','data center b','data center c','data center d','data center e']
transfer_src = np.random.choice(src_centers, p=[0.3, 0.175, 0.175, 0.175, 0.175])

dst_centers = [x for x in src_centers if x != transfer_src]
transfer_dst = np.random.choice(dst_centers)

final_transfer_status = ['transfer-success','transfer-failure']

transfer_starttime = generate_timestamp()
file_size=random.choice(range(1024,10000000000))
ftp={
    'event_type': 'transfer-queued',
    'uuid': uuid.uuid4(),
    'src_site' : transfer_src,
    'dst_site' : transfer_dst,
    'timestamp': transfer_starttime,
    'bytes' : file_size
}
print(i)
es.index(index='ft_initial', id=(i+1), doc_type='initial_transfer_details', body= ftp)

transfer_status = ['transfer-success', 'transfer-failure']
final_status = np.random.choice(transfer_status, p=[0.95,0.05])
ftp['event_type'] = final_status

if (final_status=='transfer-failure'):
    time_delay = 10
else :
    time_delay = int(transfer_time(file_size))   # ranges roughly from 0-10000 s 

ftp['timestamp'] = transfer_starttime + timedelta(seconds=time_delay)
es.index(index='ft_final', id=(i+1), doc_type='final_transfer_details', body=ftp)

有没有其他方法来加快这个过程?

任何帮助/指针将不胜感激。感谢。

1 个答案:

答案 0 :(得分:3)

  1. 使用批量处理,否则每个请求都会产生大量开销:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
  2. 更改刷新率,理想情况下完全禁用它,直到您完成为止:https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk
  3. 使用监控(这是一个免费的基本许可证)来查看实际上的瓶颈(IO,内存,CPU):https://www.elastic.co/guide/en/x-pack/current/xpack-monitoring.html