大约每2小时15分钟,我们在另一个数据中心的节点上丢失。我无法弄清楚问题可能是什么。有没有人见过这个/有过这方面的经验?
[2016-08-11 07:42:14,886][INFO ][cluster.routing.allocation] [node-exp-01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [[{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}] failed]).
[2016-08-11 07:42:14,886][INFO ][cluster.service ] [node-exp-01] removed {{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link},}, reason: zen-disco-node_failed({node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}), reason transport disconnected
[2016-08-11 07:42:14,891][INFO ][cluster.routing ] [node-exp-01] delaying allocation for [6] unassigned shards, next check in [1m]
[2016-08-11 07:42:19,402][INFO ][cluster.service ] [node-exp-01] added {{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link},}, reason: zen-disco-join(join from node[{node-cl-01}{xxxxxxxxxxxxxxxxxxx}{192.168.41.100}{192.168.41.100:9300}{rack=century-link}])
[2016-08-11 07:42:20,728][INFO ][cluster.routing.allocation] [node-exp-01] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[recordings][3]] ...]).
非常感谢,
谢谢!
答案 0 :(得分:0)
黄色表示ES已经分配了所有主要分片,但是一些口号的副本尚未分配,因此并不那么引人注目。
现在要找到reasen并不是那么容易。 我猜你在一段时间内没有位置之间的流量,ES需要在长生存连接上使用tcp keepalive消息来保持它们的持久性。 (节点之间) 检查你的底层OS tcp keepalive timeout,它应该低到,例如, 600秒,在此之后,发送第一个tcp keepalive消息。还要考虑keepalive消息的较低间隔。
使用群集帮助api并打印res。
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html