我们目前正在对ElasticSearch设置进行故障转移测试。所以这是我们使用的设置:
我们有4台ElasticSearch机器正在运行。我们将它们命名为ES1,ES2,ES3和ES4。 我们有一些索引,每个有5个分片,1个副本,所以10个分片按索引。一切都在每个节点上得到很好的分布,所以如果一个节点出现故障,一切都将仍然有效。
4个节点在Windows 7 64位上,内存为8GB。节点使用群集名称发现彼此。
我拔下ES1机器,看看一切是否仍然正常,一切都好,欢呼!
但现在这里很奇怪,我们再次插入ES1,这个不会回到群集上(名为wc2014 FYI)。他似乎也独自在一个名为wc2014的集群中。
以下是我在日志中找到的一些信息:
当我们拔掉插头时(这对我来说似乎很正常)
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][cluster:monitor/nodes/info[n]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_nearline][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_mediaresource][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_edit][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_log][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_metadata][4], node[fxTcr9-FR52jecm5a2adRg], [R], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][indices:monitor/stats[s]] disconnected
[2015-08-12 11:27:04,619][DEBUG][action.admin.indices.stats] [IPDIRECTOR-118] [wc2014_ipwsedit][4], node[fxTcr9-FR52jecm5a2adRg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1b999e6c]
然后我有不同的错误:
[2015-08-12 11:27:09,797][DEBUG][action.admin.cluster.node.info] [IPDIRECTOR-118] failed to execute on node [fxTcr9-FR52jecm5a2adRg]
org.elasticsearch.transport.SendRequestTransportException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][cluster:monitor/nodes/info[n]]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:165)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:97)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:70)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:77)
at org.elasticsearch.client.FilterClient$ClusterAdmin.execute(FilterClient.java:161)
at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient$ClusterAdmin.execute(BaseRestHandler.java:125)
at org.elasticsearch.client.support.AbstractClusterAdminClient.nodesInfo(AbstractClusterAdminClient.java:187)
at org.elasticsearch.rest.action.admin.cluster.node.info.RestNodesInfoAction.handleRequest(RestNodesInfoAction.java:102)
at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:329)
at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.handler.codec.http.HttpContentEncoder.messageReceived(HttpContentEncoder.java:82)
at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:936)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:629)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:276)
... 58 more
当我们插回节点时:
[2015-08-12 11:39:59,177][INFO ][cluster.service ] [IPDIRECTOR-118] added {[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]],}, reason: zen-disco-receive(from master [[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]]])
[2015-08-12 11:48:07,768][INFO ][discovery.zen ] [IPDIRECTOR-118] master_left [[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]]], reason [transport disconnected]
[2015-08-12 11:48:07,769][WARN ][discovery.zen ] [IPDIRECTOR-118] master left (reason = transport disconnected), current nodes: {[IPDIRECTOR-118][Z9UA4kJxTIa6B3tY4F-_vw][Ipdirector-118][inet[/10.194.1.118:9300]],[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]],[IPDIRECTOR-120][EQzx7BprQa6EVOT3V6zlqQ][Ipdirector-120][inet[/10.194.1.120:9300]],}
[2015-08-12 11:48:07,769][INFO ][cluster.service ] [IPDIRECTOR-118] removed {[IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]],}, reason: zen-disco-master_failed ([IPDIRECTOR-121][BX8BT6OgRjWM5YEhlxt9mQ][Ipdirector-121][inet[/10.194.1.121:9300]])
[2015-08-12 11:48:11,541][WARN ][discovery.zen.ping.unicast] [IPDIRECTOR-118] failed to send ping to [[IPDIRECTOR-119][3kybxeb7TMm30Pzh7rrmhA][Ipdirector-119][inet[/10.194.1.119:9300]]]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][internal:discovery/zen/unicast] request_id [124460] timed out after [3750ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2015-08-12 11:48:11,541][WARN ][discovery.zen.ping.unicast] [IPDIRECTOR-118] failed to send ping to [[IPDIRECTOR-120][EQzx7BprQa6EVOT3V6zlqQ][Ipdirector-120][inet[/10.194.1.120:9300]]]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:discovery/zen/unicast] request_id [124461] timed out after [3750ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
更多超时,然后是很多错误:
[2015-08-12 11:48:26,677][WARN ][gateway.local ] [IPDIRECTOR-118] [wc2014_clip][4]: failed to list shard stores on node [EQzx7BprQa6EVOT3V6zlqQ]
org.elasticsearch.action.FailedNodeException: Failed node [EQzx7BprQa6EVOT3V6zlqQ]
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
at org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:468)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:cluster/nodes/indices/shard/store[n]] disconnected
[2015-08-12 11:48:26,677][WARN ][gateway.local ] [IPDIRECTOR-118] [wc2014_clip][4]: failed to list shard stores on node [3kybxeb7TMm30Pzh7rrmhA]
org.elasticsearch.action.FailedNodeException: Failed node [3kybxeb7TMm30Pzh7rrmhA]
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
at org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:468)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [IPDIRECTOR-119][inet[/10.194.1.119:9300]][internal:cluster/nodes/indices/shard/store[n]] disconnected
[2015-08-12 11:48:27,081][WARN ][gateway.local ] [IPDIRECTOR-118] [wc2014_clip][3]: failed to list shard stores on node [EQzx7BprQa6EVOT3V6zlqQ]
org.elasticsearch.action.FailedNodeException: Failed node [EQzx7BprQa6EVOT3V6zlqQ]
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178)
at org.elasticsearch.transport.TransportService$3.run(TransportService.java:290)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.SendRequestTransportException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]][internal:cluster/nodes/indices/shard/store[n]]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:165)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:97)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:70)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:55)
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.list(TransportNodesListShardStoreMetaData.java:79)
at org.elasticsearch.gateway.local.LocalGatewayAllocator.buildShardStores(LocalGatewayAllocator.java:458)
at org.elasticsearch.gateway.local.LocalGatewayAllocator.allocateUnassigned(LocalGatewayAllocator.java:292)
at org.elasticsearch.cluster.routing.allocation.allocator.ShardsAllocators.allocateUnassigned(ShardsAllocators.java:74)
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:219)
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:162)
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:148)
at org.elasticsearch.discovery.zen.ZenDiscovery$3.execute(ZenDiscovery.java:387)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:365)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
... 3 more
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [IPDIRECTOR-120][inet[/10.194.1.120:9300]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:936)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:629)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:276)
如果我想解决这个问题,我必须手动重启节点,然后一切都恢复正常。
该节点是否应该自动与ES2,3,4进行对话并一起返回群集,而无需对其进行一些手动操作?
谢谢, 马提亚。
答案 0 :(得分:0)
检查elesticsearch.yml文件
/etc/elasticsearch/elasticsearch.yml
您需要验证发现类型是否与您在ex ec2中运行的环境相匹配。
答案 1 :(得分:0)
好的,我们可以解决遇到的问题。我们有4台ElasticSearch机器,但只有一台在主节点中设置,所以当网络关闭时,2个集群开始并排生存。