Question

我有一个5节点ES群集，每个节点上有32 Gig RAM。我为ES进程分配了20GB。这些是我yml中的相关字段。

discovery.type: ec2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.timeout: 60s (changed this as 10 s was not enough)
discovery.zen.minimum_master_nodes: 3
script.disable_dynamic: true
bootstrap.mlockall: true
indices.fielddata.cache.size: 50%
indices.breaker.fielddata.limit: 60%
indices.breaker.request.limit: 40%
indices.breaker.total.limit: 70%

Elasticsearch版本：1.3.1

我可以在每分钟500-1000个文档之间进行索引（其结构更像是推文和社交网络数据）。我的群集中有6.4亿个文档（不包括副本）和800 GB的数据（包括副本）。

最近我观察到堆不断增加，最后，OOM的GC暂停使节点关闭。我认为索引比查询更成问题，因为字段数据缓存和过滤缓存从不超过3 GB。

这是当前的群集运行状况

{
"cluster_name": "name_of_cluster",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 5,
"number_of_data_nodes": 5,
"active_primary_shards": 15,
"active_shards": 26,
"relocating_shards": 0,
"initializing_shards": 4,
"unassigned_shards": 0
}

我想知道我需要做哪些改进。我应该将RAM增加到每个节点64 G和这样的选项吗？我也在考虑使用doc_values并将ES升级到最新版本。但我想在采取任何行动之前了解这种行为的根本原因。

这是热线输出https://gist.github.com/naryad/abe852c04dbac5e5611a 这是节点统计信息API https://gist.github.com/naryad/06ec0e17c0c02e311e80

的输出

堆缓慢填充（旧生成对象）并且GC发生后，由于GC，没有任何旧生成对象被清除。旧代对象占分配给ES的20 GB堆的90％。

Answer 1

通过将Elasticsearch从1.3.1升级到1.6.0来解决GC问题和极端堆消耗行为。升级指南见https://www.elastic.co/guide/en/elasticsearch/reference/1.6/setup-upgrade.html

除了上述指南中给出的步骤外，在我的情况下，设置indices.recovery.compress必须设置为false才能完成升级过程，原因是使用的压缩算法发生了变化跨越这两个版本的Elasticsearch。完成滚动升级后，此设置可以恢复为true。

索引到800 GB群集时，GC暂停和OOM错误

1 个答案: