网络中断后的ElasticSearch节点无法看到其他节点

时间:2015-08-12 14:21:13

标签: elasticsearch failover

我们目前正在对ElasticSearch设置进行故障转移测试。所以这是我们使用的设置:

我们有4台ElasticSearch机器正在运行。我们将它们命名为ES1,ES2,ES3和ES4。 我们有一些索引,每个有5个分片,1个副本,所以10个分片按索引。一切都在每个节点上得到很好的分布,所以如果一个节点出现故障,一切都将仍然有效。

4个节点在Windows 7 64位上,内存为8GB。节点使用群集名称发现彼此。

我拔下ES1机器,看看一切是否仍然正常,一切都好,欢呼!

但现在这里很奇怪,我们再次插入ES1,这个不会回到群集上(名为wc2014 FYI)。他似乎也独自在一个名为wc2014的集群中。

以下是我在日志中找到的一些信息:

当我们拔掉插头时(这对我来说似乎很正常)

org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][cluster:monitor/nodes/info[n]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_nearline][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_mediaresource][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_edit][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_log][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_metadata][4], node[fxTcr9-FR52jecm5a2adRg], [R], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_ipwsedit][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]

然后我有不同的错误:

[2015-08-12 11:27:09,797][DEBUG][action.admin.cluster.node.info] [IPDIRECTOR-118] failed to execute on node [fxTcr9-FR52jecm5a2adRg]
org.elasticsearch.transport.SendRequestTransportException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][cluster:monitor/nodes/info[n]]
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:165)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:97)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:70)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
    at org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:77)
    at org.elasticsearch.client.FilterClient$ClusterAdmin.execute(FilterClient.java:161)
    at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient$ClusterAdmin.execute(BaseRestHandler.java:125)
    at org.elasticsearch.client.support.AbstractClusterAdminClient.nodesInfo(AbstractClusterAdminClient.java:187)
    at org.elasticsearch.rest.action.admin.cluster.node.info.RestNodesInfoAction.handleRequest(RestNodesInfoAction.java:102)
    at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
    at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
    at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
    at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
    at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
    at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:329)
    at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
    at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.handler.codec.http.HttpContentEncoder.messageReceived(HttpContentEncoder.java:82)
    at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]] Node not connected
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:936)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:629)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:276)
    ... 58 more

当我们插回节点时:

[2015-08-12 11:39:59,177][INFO ][cluster.service          ] [IPDIRECTOR-118] added {[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]],}, reason: zen-disco-receive(from master [[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]]])
[2015-08-12 11:48:07,768][INFO ][discovery.zen            ] [IPDIRECTOR-118] master_left [[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]]], reason [transport disconnected]
[2015-08-12 11:48:07,769][WARN ][discovery.zen            ] [IPDIRECTOR-118] master left (reason = transport disconnected), current nodes: {[IPDIRECTOR-118][Z9UA4kJxTIa6B3tY4F-_vw][Ipdirector-118][inet[/10.194.1.118:9300]],[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]],[IPDIRECTOR-120][EQzx7BprQa6EVOT3V6zlqQ][Ipdirector-120][inet[/10.194.1.120:9300]],}
[2015-08-12 11:48:07,769][INFO ][cluster.service          ] [IPDIRECTOR-118] removed {[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]],}, reason: zen-disco-master_failed ([IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]])
[2015-08-12 11:48:11,541][WARN ][discovery.zen.ping.unicast] [IPDIRECTOR-118] failed to send ping to [[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]]]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][internal:discovery/zen/unicast] request_id [124460] timed out after [3750ms]
    at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
[2015-08-12 11:48:11,541][WARN ][discovery.zen.ping.unicast] [IPDIRECTOR-118] failed to send ping to [[IPDIRECTOR-120][EQzx7BprQa6EVOT3V6zlqQ][Ipdirector-120][inet[/10.194.1.120:9300]]]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:discovery/zen/unicast] request_id [124461] timed out after [3750ms]
    at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

更多超时,然后是很多错误:

[2015-08-12 11:48:26,677][WARN ][gateway.local            ] [IPDIRECTOR-118] [wc2014_clip][4]: failed to list shard stores on node [EQzx7BprQa6EVOT3V6zlqQ]
org.elasticsearch.action.FailedNodeException: Failed node [EQzx7BprQa6EVOT3V6zlqQ]
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
    at org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:468)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:cluster/nodes/indices/shard/store[n]] disconnected
[2015-08-12 11:48:26,677][WARN ][gateway.local            ] [IPDIRECTOR-118] [wc2014_clip][4]: failed to list shard stores on node [3kybxeb7TMm30Pzh7rrmhA]
org.elasticsearch.action.FailedNodeException: Failed node [3kybxeb7TMm30Pzh7rrmhA]
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
    at org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:468)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][internal:cluster/nodes/indices/shard/store[n]] disconnected
[2015-08-12 11:48:27,081][WARN ][gateway.local            ] [IPDIRECTOR-118] [wc2014_clip][3]: failed to list shard stores on node [EQzx7BprQa6EVOT3V6zlqQ]
org.elasticsearch.action.FailedNodeException: Failed node [EQzx7BprQa6EVOT3V6zlqQ]
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
    at org.elasticsearch.transport.TransportService$3.run(TransportService.java:290)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.SendRequestTransportException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:cluster/nodes/indices/shard/store[n]]
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:165)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:97)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:70)
    at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:55)
    at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.list(TransportNodesListShardStoreMetaData.java:79)
    at org.elasticsearch.gateway.local.LocalGatewayAllocator.buildShardStores(LocalGatewayAllocator.java:458)
    at org.elasticsearch.gateway.local.LocalGatewayAllocator.allocateUnassigned(LocalGatewayAllocator.java:292)
    at org.elasticsearch.cluster.routing.allocation.allocator.ShardsAllocators.allocateUnassigned(ShardsAllocators.java:74)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:219)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:162)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:148)
    at org.elasticsearch.discovery.zen.ZenDiscovery$3.execute(ZenDiscovery.java:387)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:365)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
    ... 3 more
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]] Node not connected
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:936)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:629)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:276)

如果我想解决这个问题,我必须手动重启节点,然后一切都恢复正常。

该节点是否应该自动与ES2,3,4进行对话并一起返回群集,而无需对其进行一些手动操作?

谢谢, 马提亚。

2 个答案:

答案 0 :(得分:0)

检查elesticsearch.yml文件

/etc/elasticsearch/elasticsearch.yml

您需要验证发现类型是否与您在ex ec2中运行的环境相匹配。

答案 1 :(得分:0)

好的,我们可以解决遇到的问题。我们有4台ElasticSearch机器,但只有一台在主节点中设置,所以当网络关闭时,2个集群开始并排生存。