Question

大约每2小时15分钟，我们在另一个数据中心的节点上丢失。我无法弄清楚问题可能是什么。有没有人见过这个/有过这方面的经验？

[2016-08-11 07:42:14,886][INFO ][cluster.routing.allocation] [node-exp-01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [[{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}] failed]).
[2016-08-11 07:42:14,886][INFO ][cluster.service          ] [node-exp-01] removed {{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link},}, reason: zen-disco-node_failed({node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}), reason transport disconnected
[2016-08-11 07:42:14,891][INFO ][cluster.routing          ] [node-exp-01] delaying allocation for [6] unassigned shards, next check in [1m]
[2016-08-11 07:42:19,402][INFO ][cluster.service          ] [node-exp-01] added {{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link},}, reason: zen-disco-join(join from node[{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}])
[2016-08-11 07:42:20,728][INFO ][cluster.routing.allocation] [node-exp-01] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[recordings][3]] ...]).

非常感谢，

谢谢！

Answer 1

黄色表示ES已经分配了所有主要分片，但是一些口号的副本尚未分配，因此并不那么引人注目。

现在要找到reasen并不是那么容易。我猜你在一段时间内没有位置之间的流量，ES需要在长生存连接上使用tcp keepalive消息来保持它们的持久性。（节点之间）检查你的底层OS tcp keepalive timeout，它应该低到，例如， 600秒，在此之后，发送第一个tcp keepalive消息。还要考虑keepalive消息的较低间隔。

使用群集帮助api并打印res。

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html

ElasticSearch每2小时15分钟发布一次

1 个答案: