Question

我们的生产Elasticsearch集群存在问题，Elasticsearch似乎随着时间的推移消耗了每台服务器上的所有RAM。每个盒子有128GB的RAM，所以我们运行两个实例，每个为JVM堆分配30GB。剩下的68G留给操作系统和Lucene。我们上周重新启动了每台服务器，并且每个Elasticsearch进程使用24％的RAM恰好启动了RAM。现在已经差不多一个星期，我们的内存消耗已经达到每个Elasticsearch实例大约40％。我附上了我们的配置文件，希望有人能够帮助弄清楚为什么Elasticsearch已经超出了我们为内存利用率设置的限制。

目前我们正在运行ES 1.3.2，但下周将通过我们的下一版本升级到1.4.2。

以下是重启后的顶部视图（为清晰起见，删除了额外的字段）：

  PID USER       %MEM    TIME+  
 2178 elastics   24.1   1:03.49 
 2197 elastics   24.3   1:07.32

今天有一个：

 PID USER       %MEM    TIME+  
2178 elastics   40.5   2927:50 
2197 elastics   40.1   3000:44

elasticserach-0.yml：

cluster.name: PROD  
node.name: "PROD6-0"  
node.master: true  
node.data: true 
node.rack: PROD6
cluster.routing.allocation.awareness.force.rack.values:
PROD4,PROD5,PROD6,PROD7,PROD8,PROD9,PROD10,PROD11,PROD12
cluster.routing.allocation.awareness.attributes: rack
node.max_local_storage_nodes: 2  
path.data: /es_data1  
path.logs:/var/log/elasticsearch  
bootstrap.mlockall: true  
transport.tcp.port:9300  
http.port: 9200  
http.max_content_length: 400mb
gateway.recover_after_nodes: 17  
gateway.recover_after_time: 1m
gateway.expected_nodes: 18
cluster.routing.allocation.node_concurrent_recoveries: 20
indices.recovery.max_bytes_per_sec: 200mb
discovery.zen.minimum_master_nodes: 10  
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: XXX
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
monitor.jvm.gc.young.warn: 1000ms  
monitor.jvm.gc.young.info: 700ms
monitor.jvm.gc.young.debug: 400ms  
monitor.jvm.gc.old.warn: 10s
monitor.jvm.gc.old.info: 5s  
monitor.jvm.gc.old.debug: 2s
action.auto_create_index: .marvel-*  
action.disable_delete_all_indices: true  
indices.cache.filter.size: 10%  
index.refresh_interval: -1
threadpool.search.type: fixed  
threadpool.search.size: 48
threadpool.search.queue_size: 10000000
cluster.routing.allocation.cluster_concurrent_rebalance: 6
indices.store.throttle.type: none  
index.reclaim_deletes_weight: 4.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
marvel.agent.exporter.es.hosts: ["1.1.1.1:9200","1.1.1.1:9200"]
marvel.agent.enabled: true  
marvel.agent.interval: 30s
script.disable_dynamic: false

这是/ etc / sysconfig / elasticsearch-0：

# Directory where the Elasticsearch binary distribution resides
ES_HOME=/usr/share/elasticsearch
# Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=30g
# Heap new generation
#ES_HEAP_NEWSIZE=
# max direct memory
#ES_DIRECT_SIZE=
# Additional Java OPTS
#ES_JAVA_OPTS=
# Maximum number of open files
MAX_OPEN_FILES=65535
# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited
# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144
# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch
# Elasticsearch data directory
DATA_DIR=/es_data1
# Elasticsearch work directory
WORK_DIR=/tmp/elasticsearch
# Elasticsearch conf directory
CONF_DIR=/etc/elasticsearch
# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE=/etc/elasticsearch/elasticsearch-0.yml
# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=elasticsearch
# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

如果我能提供任何其他数据，请告诉我。提前感谢您的帮助。

            total       used       free     shared    buffers     cached
Mem:        129022     119372       9650          0        219      46819
-/+ buffers/cache:      72333      56689
Swap:        28603          0      28603

Answer 1

您所看到的不是堆爆发，堆将始终受限于您在配置中设置的内容。关于操作系统相关使用的free -m和top报告，因此在那里使用很可能是操作系统缓存FS调用。

这不会导致java OOM。

如果您遇到与空间不足的Java堆直接相关的java OOM，那么还有其他的东西在起作用。您的日志可能会提供一些信息。

Elasticsearch内存问题 - ES进程占用所有内存

1 个答案: