Question

过去几周，我们遇到了ElasticSearch服务器死机的问题。我不知道问题是什么，我不确定从哪里开始？

我们可以重新启动服务器，它可以随机运行一段时间。有时在一天的剩余时间，有时是几分钟，但最终总会再次崩溃。

以下是一些细节，我希望有人能够处理并指出我正确的方向：

ElasticSearch服务器信息：

{
  "name" : "WIuGVV9",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "T2Vvt3hzQhSJa4ZFWtdMKA",
  "version" : {
    "number" : "5.5.1",
    "build_hash" : "19c13d0",
    "build_date" : "2017-07-18T20:44:24.823Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

$ cat / etc / centos-release

CentOS Linux release 7.3.1611 (Core)

$ java -version

openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)

$ free

              total        used        free      shared  buff/cache   available
Mem:        7915072     3650556     2827472      378156     1437044     3802152
Swap:             0           0           0

elasticsearch.log：

[2017-08-02T14:14:37,927][WARN ][o.e.b.Natives            ] unable to load JNA native support library, native methods will be disabled.
java.lang.UnsatisfiedLinkError: /tmp/jna--1985354563/jna3117985363123958860.tmp: /tmp/jna--1985354563/jna3117985363123958860.tmp: failed to map segment from shared object: Operation not permitted
    at java.lang.ClassLoader$NativeLibrary.load(Native Method) ~[?:1.8.0_141]
    at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) ~[?:1.8.0_141]
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) ~[?:1.8.0_141]
    at java.lang.Runtime.load0(Runtime.java:809) ~[?:1.8.0_141]
    at java.lang.System.load(System.java:1086) ~[?:1.8.0_141]
    at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:947) ~[jna-4.4.0.jar:4.4.0 (b0)]
    at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:922) ~[jna-4.4.0.jar:4.4.0 (b0)]
    at com.sun.jna.Native.<clinit>(Native.java:190) ~[jna-4.4.0.jar:4.4.0 (b0)]
    at java.lang.Class.forName0(Native Method) ~[?:1.8.0_141]
    at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_141]
    at org.elasticsearch.bootstrap.Natives.<clinit>(Natives.java:45) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:105) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:351) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.5.1.jar:5.5.1]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.5.1.jar:5.5.1]
[2017-08-02T14:14:37,936][WARN ][o.e.b.Natives            ] cannot check if running as root because JNA is not available
[2017-08-02T14:14:37,937][WARN ][o.e.b.Natives            ] cannot install system call filter because JNA is not available
[2017-08-02T14:14:37,937][WARN ][o.e.b.Natives            ] cannot register console handler because JNA is not available
[2017-08-02T14:14:37,941][WARN ][o.e.b.Natives            ] cannot getrlimit RLIMIT_NPROC because JNA is not available
[2017-08-02T14:14:37,941][WARN ][o.e.b.Natives            ] cannot getrlimit RLIMIT_AS beacuse JNA is not available
[2017-08-02T14:14:38,150][INFO ][o.e.n.Node               ] [] initializing ...
[2017-08-02T14:14:38,341][INFO ][o.e.e.NodeEnvironment    ] [WIuGVV9] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [7.8gb], net total_space [38.7gb], spins? [unknown], types [rootfs]
[2017-08-02T14:14:38,341][INFO ][o.e.e.NodeEnvironment    ] [WIuGVV9] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-08-02T14:14:38,452][INFO ][o.e.n.Node               ] node name [WIuGVV9] derived from node ID [WIuGVV9sS0mlLPTxRVKn0w]; set [node.name] to override
[2017-08-02T14:14:38,453][INFO ][o.e.n.Node               ] version[5.5.1], pid[21555], build[19c13d0/2017-07-18T20:44:24.823Z], OS[Linux/3.10.0-327.4.4.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_141/25.141-b16]
[2017-08-02T14:14:38,453][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/share/elasticsearch]
[2017-08-02T14:14:40,409][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [aggs-matrix-stats]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [ingest-common]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [lang-expression]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [lang-groovy]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [lang-mustache]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [lang-painless]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [parent-join]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [percolator]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [reindex]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [transport-netty3]
[2017-08-02T14:14:40,410][INFO ][o.e.p.PluginsService     ] [WIuGVV9] loaded module [transport-netty4]
[2017-08-02T14:14:40,411][INFO ][o.e.p.PluginsService     ] [WIuGVV9] no plugins loaded
[2017-08-02T14:14:42,797][INFO ][o.e.d.DiscoveryModule    ] [WIuGVV9] using discovery type [zen]
[2017-08-02T14:14:43,759][INFO ][o.e.n.Node               ] initialized
[2017-08-02T14:14:43,760][INFO ][o.e.n.Node               ] [WIuGVV9] starting ...
[2017-08-02T14:14:44,061][INFO ][o.e.t.TransportService   ] [WIuGVV9] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2017-08-02T14:14:44,088][WARN ][o.e.b.BootstrapChecks    ] [WIuGVV9] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-08-02T14:14:47,174][INFO ][o.e.c.s.ClusterService   ] [WIuGVV9] new_master {WIuGVV9}{WIuGVV9sS0mlLPTxRVKn0w}{uZIN-61JT4KLP1xUSVTdLQ}{localhost}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-02T14:14:47,204][INFO ][o.e.h.n.Netty4HttpServerTransport] [WIuGVV9] publish_address {66.55.80.152:9200}, bound_addresses {[::]:9200}
[2017-08-02T14:14:47,204][INFO ][o.e.n.Node               ] [WIuGVV9] started
[2017-08-02T14:14:47,589][INFO ][o.e.g.GatewayService     ] [WIuGVV9] recovered [3] indices into cluster_state
[2017-08-02T14:14:48,268][INFO ][o.e.c.r.a.AllocationService] [WIuGVV9] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[orgs][4], [orgs][0]] ...]).
[2017-08-02T14:18:01,316][INFO ][o.e.c.m.MetaDataDeleteIndexService] [WIuGVV9] [orgs/ZcizOIAsRWqSXZY8ZiR0BA] deleting index

/etc/elasticsearch/elasticsearch.yml：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
transport.host: localhost
transport.tcp.port: 9300
#network.bind_host: "0.0.0.0"
#network.publish_host: _non_loopback:ipv4_
#network.host: _local_
network.host: 0.0.0.0
#network.bind_host: 
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

/etc/elasticsearch/jvm.options：

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms2g
-Xmx2g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true

/ etc / sysconfig / elasticsearch：

################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#JAVA_HOME=

# Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch

# Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch

# Elasticsearch logs directory
#LOG_DIR=/var/log/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# When executing the init script, this user will be used to run the elasticsearch service.
# The default value is 'elasticsearch' and is declared in the init.d file.
# Note that this setting is only used by the init script. If changed, make sure that
# the configured user can read and write into the data, work, plugins and log directories.
# For systemd service, the user is usually configured in file /usr/lib/systemd/system/elasticsearch.service
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch

# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
#MAX_OPEN_FILES=65536

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using Systemd, the LimitMEMLOCK property must be set
# in /usr/lib/systemd/system/elasticsearch.service
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144

任何人都可以确定我的错误是什么吗？我无法分辨，我最好的是权限问题或内存问题。我们确实有8 GB，这几乎没有标记。

思想？

万分感谢。

[编辑：新细节]：

dmesg输出很多。我不确定是否有办法从输出中获取时间戳，但我一遍又一遍地看到：

[10131180.901171] Out of memory: Kill process 13777 (java) score 295 or sacrifice child
[10131180.901186] Killed process 13777 (java) total-vm:5800372kB, anon-rss:2334928kB, file-rss:80kB
[10137088.438235] exim[7581]: segfault at 58 ip 000000000046bee7 sp 00007ffdd0dc63e0 error 4 in exim[400000+fa000]
[10138765.544389] exim[16401]: segfault at 58 ip 000000000046bee7 sp 00007ffede7cc3f0 error 4 in exim[400000+fa000]
[10162265.107101] exim[28217]: segfault at 58 ip 000000000046bee7 sp 00007fff84afb6e0 error 4 in exim[400000+fa000]

我认为这是我们问题的根本原因，当老板进入时我们会看到升级服务器，因为它只有8GB的内存。

[编辑：甚至更新的细节]：

运行此命令：

ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -5

ES服务器被杀后：

6.3  0.0 767376   560 /usr/local/cpanel/3rdparty/bin/clamd
2.7 25.5 310612  5060 dovecot/lmtp
2.4  0.3 860920 24850 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/gigenet.pwi.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/gigenet.pwi.com.pid
0.3  0.3  75216 17701 tailwatchd
0.3  0.0 393280  5384 /usr/sbin/named -u named

重启ES服务器后

9.3 45.5 4834304 6815 /bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Xms4g -Xmx4g -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
6.3  0.0 767376   560 /usr/local/cpanel/3rdparty/bin/clamd
2.4  0.3 860920 24850 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/gigenet.pwi.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/gigenet.pwi.com.pid
1.2  1.1 243444 23536 /usr/local/cpanel/3rdparty/perl/524/bin/perl -T -w /usr/local/cpanel/3rdparty/bin/spamd --max-spare=1 --max-children=3 --allowed-ips=127.0.0.1,::1 --pidfile=/var/run/spamd.pid --listen=5 --listen=6
1.2  0.5 244292  6618 spamd child

根据问题答案的结果，我找到了这篇文章

Prevent elasticsearch from being killed by OOM killer

试图弄清楚为什么OOM会杀死java。我已经改变了我的配置以匹配第一个答案中的建议，服务器仍然关闭。

基于dmesg，被杀死的进程是：

Out of memory: Kill process 24532 (java) score 293
Out of memory: Kill process 3408 (clamd) score 60
Out of memory: Kill process 1970 (lmtp) score 28
Out of memory: Kill process 17806 (mysqld) score 27

Answer 1

Elasticsearch的日志中是否有任何表明干净关闭的内容？你可以在重启之前粘贴行吗？

如果内核OOM杀手正在停止弹性搜索，你可以查看dmesg吗？建议操作系统不惜一切代价释放内存时会发生这种情况。然后OOM杀手通常选择具有最多记忆的过程并将其杀死。大部分时间都是Elasticsearch。

是否有任何其他服务在该计算机上运行，这可能会导致触发OOM杀手？

您是否随着时间的推移监控系统？记忆被吃掉了吗？如果该过程被杀死，某些资源是否恢复正常？

ElasticServer全天随机关闭

1 个答案: