Elasticsearch / _cluster / health节点报告的数字高于/ _nodes

时间:2014-04-24 18:08:33

标签: elasticsearch

我有一个由三台机器组成的小型集群。当我们最初部署解决方案时,我们意识到我们没有足够的存储空间来满足我们的预期需求。因此,我们在第一个节点上关闭了ES服务,为机器附加并配置了EBS卷,将数据移动到新卷,更新配置文件以指向新数据目录并重新启动服务。一切都好。集群很快就变绿了。接下来的两台机器也一样。然后我们意识到我们没有在第二个节点上更新配置文件。所以我们停止了服务,更新了配置文件并启动了服务。从那时起,我们的集群健康一直保持黄色有趣的是,我发现/ _cluster / health端点和/ _nodes端点中报告的节点数之间存在差异。同样令人感兴趣的是,由于我们在AWS中,我们正在使用单播发现并明确指向其他节点。我看到它正在尝试初始化分片,但不知道它在哪里/如何/为什么要尝试这样做。有什么想法吗?

/_cluster/health

{
  "cluster_name" : "the_name_of_my_cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 5,
  "active_primary_shards" : 20,
  "active_shards" : 33,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 3
}

/_nodes

{
  "ok" : true,
  "cluster_name" : "abc_cluster",
  "nodes" : {
    "identifier" : {
      "name" : "node3",
      "transport_address" : "inet[/ip3:9300]",
      "version" : "0.90.5",
      "http_address" : "inet[/ip3:9200]"
    },
    "identifier" : {
      "name" : "node2",
      "transport_address" : "inet[/ip2:9300]",
      "version" : "0.90.5",
      "http_address" : "inet[/ip2:9200]"
    },
    "identifier" : {
      "name" : "node1",
      "transport_address" : "inet[/ip1:9300]",
      "version" : "0.90.5",
      "http_address" : "inet[/ip1:9200]"
    }
  }
}

修改

我开始在第二个节点上拖尾ES日志。事实证明它试图在9301和9302

上到达我们的第三个节点
[2014-04-24 20:08:37,005][WARN ][cluster.service          ] [node2] failed to reconnect to node [node3][identifying_info][inet[/ip3:9302]]
org.elasticsearch.transport.ConnectTransportException: [node3][inet[/ip3:9302]] connect_timeout[30s]
    at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:675)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:576)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)
    at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:475)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection timed out: /ip3:9302
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more
[2014-04-24 20:09:17,042][WARN ][cluster.service          ] [node2] failed to reconnect to node [node3][identifying_info][inet[/ip3:9301]]
org.elasticsearch.transport.ConnectTransportException: [node3][inet[/ip3:9301]] connect_timeout[30s]
    at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:675)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:576)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)
    at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:475)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection timed out: /ip3:9301
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more

因此,我们为ES安全组开放了9300-9303的范围。

瞧。我们有一个绿色集群。

/_cluster/health

{
  "cluster_name" : "the_name_of_my_cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 20,
  "active_shards" : 40,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

但是 - 我们现在显示6个节点而不是我期望的3个节点。是什么给了什么?

0 个答案:

没有答案