我有一个由三台机器组成的小型集群。当我们最初部署解决方案时,我们意识到我们没有足够的存储空间来满足我们的预期需求。因此,我们在第一个节点上关闭了ES服务,为机器附加并配置了EBS卷,将数据移动到新卷,更新配置文件以指向新数据目录并重新启动服务。一切都好。集群很快就变绿了。接下来的两台机器也一样。然后我们意识到我们没有在第二个节点上更新配置文件。所以我们停止了服务,更新了配置文件并启动了服务。从那时起,我们的集群健康一直保持黄色有趣的是,我发现/ _cluster / health端点和/ _nodes端点中报告的节点数之间存在差异。同样令人感兴趣的是,由于我们在AWS中,我们正在使用单播发现并明确指向其他节点。我看到它正在尝试初始化分片,但不知道它在哪里/如何/为什么要尝试这样做。有什么想法吗?
/_cluster/health
{
"cluster_name" : "the_name_of_my_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 5,
"active_primary_shards" : 20,
"active_shards" : 33,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 3
}
/_nodes
{
"ok" : true,
"cluster_name" : "abc_cluster",
"nodes" : {
"identifier" : {
"name" : "node3",
"transport_address" : "inet[/ip3:9300]",
"version" : "0.90.5",
"http_address" : "inet[/ip3:9200]"
},
"identifier" : {
"name" : "node2",
"transport_address" : "inet[/ip2:9300]",
"version" : "0.90.5",
"http_address" : "inet[/ip2:9200]"
},
"identifier" : {
"name" : "node1",
"transport_address" : "inet[/ip1:9300]",
"version" : "0.90.5",
"http_address" : "inet[/ip1:9200]"
}
}
}
我开始在第二个节点上拖尾ES日志。事实证明它试图在9301和9302
上到达我们的第三个节点[2014-04-24 20:08:37,005][WARN ][cluster.service ] [node2] failed to reconnect to node [node3][identifying_info][inet[/ip3:9302]]
org.elasticsearch.transport.ConnectTransportException: [node3][inet[/ip3:9302]] connect_timeout[30s]
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:675)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:576)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)
at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:475)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection timed out: /ip3:9302
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
[2014-04-24 20:09:17,042][WARN ][cluster.service ] [node2] failed to reconnect to node [node3][identifying_info][inet[/ip3:9301]]
org.elasticsearch.transport.ConnectTransportException: [node3][inet[/ip3:9301]] connect_timeout[30s]
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:675)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:576)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)
at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:475)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection timed out: /ip3:9301
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
因此,我们为ES安全组开放了9300-9303的范围。
瞧。我们有一个绿色集群。
/_cluster/health
{
"cluster_name" : "the_name_of_my_cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 20,
"active_shards" : 40,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
但是 - 我们现在显示6个节点而不是我期望的3个节点。是什么给了什么?