运行4节点elasticsearch集群。它有一个客户端节点和3个数据节点。数据节点也配置为主合格节点。
elasticsearch版本为2.3.5
lucene版本是5.5.0
JAVA是openjdk版本1.8.0_141
下面的是运行elasticsearch的JAVA_OPTS值。
-Xms10g -Xmx10g -Djava.awt.headless = true -XX:+ UseParNewGC -XX:+ UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction = 75 -XX:+ UseCMSInitiatingOccupancyOnly -XX:+ HeapDumpOnOutOfMemoryError -XX:+ DisableExplicitGC < /强>
当GC被触发时,数据节点离开集群。以下是GC日志:
[2017-09-28 11:52:52,186][WARN ][monitor.jvm ] [10.0.1.85] [gc][old][504680][614] duration [39.8m], collections [84]/[40.9m], total [39.8m]/[5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] [532.5mb]->[532.4mb]/[532.5mb]}{[survivor] [66.4mb]->[66.4mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-09-28 11:52:53,186][WARN ][monitor.jvm ] [10.0.1.85] [gc][old][504681][615] duration [29.5s], collections [1]/[30.5s], total [29.5s]/[5h], memory [9.9gb]->[6.1gb]/[9.9gb], all_pools {[young] [532.4mb]->[47.1mb]/[532.5mb]}{[survivor] [66.4mb]->[66.4mb]/[66.5mb]}{[old] [9.3gb]->[6gb]/[9.3gb]}
在此期间,群集不会处理查询。和查询失败
[2017-09-28 12:34:49,232][DEBUG][action.admin.cluster.health] [10.0.1.85] no known master node, scheduling a retry
[2017-09-28 12:35:19,233][DEBUG][action.admin.cluster.health] [10.0.1.85] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2017-09-28 12:35:19,234][WARN ][rest.suppressed ] path: /_cluster/health, params: {}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
且数据节点无法发现主节点,下面是日志
[2017-09-28 12:33:48,614][WARN ][discovery.zen.ping.unicast] [10.0.1.85] failed to send ping to [{#zen_unicast_3#}{10.0.1.89}{10.0.1.89:9300}]
ReceiveTimeoutTransportException[[][10.0.1.89:9300][internal:discovery/zen/unicast] request_id [648779] timed out after [3750ms]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:679)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
确实需要这方面的建议,我们如何优化它。