简而言之,我有一个独立的ES主实例和一个在我的Java应用程序中创建的客户机节点。如果在客户端节点之前启动独立ES实例,则客户端节点会正确发现独立ES实例。
我面临的问题是 - 如果出于某种原因,客户端节点在独立ES实例之前启动,我会看到“MasterNotDiscoveredException”,这也是预期的。但是,即使在启动独立ES实例后,我仍然看到相同的异常。是否有一些配置我应该改变来解决这个问题?
我正在使用ES 1.7.1进行单播发现。
修改
群集信息:独立ES实例和客户端节点共同构成群集。
客户端节点堆栈跟踪:
11:29:35,634 INFO http [496648366, id=7BCBFQLCTWOO2, ide=tcp://172.17.78.80:61616] [Squidboy] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.78.80:9200]}
11:29:35,635 INFO node [496648366, id=7BCBFQLCTWOO2, ide=tcp://172.17.78.80:61616] [Squidboy] started
11:30:10,279 ERROR ApplicationLifeCycle [299961584] System startup not complete after 120 seconds ...
11:30:14,706 WARN ElasticSearchStatus [278792216] An Exception occurred during cluster health status update - java.util.concurrent.ExecutionException: org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:292)
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:279)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:117)
at com.harry.elastic.node.ElasticSearchStatus.updateClusterHealth(ElasticSearchStatus.java:90)
at com.harry.elastic.node.ElasticSearchStatus.access$000(ElasticSearchStatus.java:37)
at com.harry.elastic.node.ElasticSearchStatus$1.run(ElasticSearchStatus.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$4.onTimeout(TransportMasterNodeOperationAction.java:164)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:231)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:560)
... 3 more
客户端创建代码:
private Node createEmbeddedClientNode() {
ImmutableSettings.Builder settingsBuilder = ImmutableSettings.settingsBuilder()
.put("discovery.zen.ping.multicast.enabled", false)
.put("discovery.zen.ping.unicast.hosts", "localhost[9300-9400]");
return nodeBuilder().settings(settingsBuilder).clusterName("harryService")
.client(true).data(false).node();
}
主发现配置
"discovery": {
"zen": {
"ping": {
"multicast": {
"enabled": false
}
}
}
答案 0 :(得分:2)
默认情况下,您的客户端节点将每30秒重试一次主节点ping 3次,然后放弃。因此,如果在经过该时间后启动了主节点,则客户端节点将不会发现它。
尝试增加重试和/或超时,这应该会有所帮助。
.put("discovery.zen.fd.ping_timeout", "1m")
.put("discovery.zen.fd.ping_retries", 5)
使用这些设置,您的客户端节点将在5分钟内继续尝试,而不是仅仅1.5分钟。但是,当您启动应用程序时,您的主节点应该已经启动。
可能有帮助的另一个设置如下,因为默认情况下它是真的,你的主人会在主人选中忽略客户端ping,但由于单个主节点可能没有任何区别,所以还是值得一试:
.put("discovery.zen.master_election.filter_client", false)
答案 1 :(得分:1)
我通过在主节点中明确添加单播配置解决了这个问题。
"discovery": {
"zen": {
"ping": {
"multicast": {
"enabled": false
},
"unicast": {
"hosts": "localhost[9300-9400]"
}
}
}
}