我正在构建一个节点集群。两个工作正常(它们被加入一个集群),我试图添加第三个(称为eu5
),当它启动时,它不加入集群:
[root@eu5:/etc/elasticsearch]# curl eu5:9200
{
"status" : 503,
"name" : "eu5",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
日志提到了发现问题:
[2015-01-09 15:35:23,399][INFO ][node ] [eu5] starting ...
[2015-01-09 15:35:23,468][INFO ][transport ] [eu5] bound_address {inet[/10.81.147.186:9300]}, publish_address {inet[/10.81.147.186:9300]}
[2015-01-09 15:35:23,475][INFO ][discovery ] [eu5] security/FdjfWCWgT-mQtipLdi9BFA
[2015-01-09 15:35:53,476][WARN ][discovery ] [eu5] waited for 30s and no initial state was set by the discovery
[2015-01-09 15:35:53,493][INFO ][http ] [eu5] bound_address {inet[/10.81.147.186:9200]}, publish_address {inet[/10.81.147.186:9200]}
[2015-01-09 15:35:53,494][INFO ][node ] [eu5] started
配置强制单播
cluster.name: security
node.name: eu5
network.host: 10.81.147.186
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast: ["elk.example.com"]
并且提示服务器可以从我想加入的服务器获得:
[root@eu5:/etc/elasticsearch]# curl elk.example.com:9200
{
"status" : 200,
"name" : "eu4",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
从我想加入的服务器
,两种方式都可以使用9200和9300端口[root@eu5:/etc/elasticsearch]# nmap -p9200,9300 elk.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
以及从主服务器到该服务器
[root@eu4:/etc/elasticsearch]# nmap -p9200,9300 eu5.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
还有什么我应该检查的吗?
更新:在Andrei Stefan的评论后,我切换到DEBUG
进行日志记录。我得到了诸如
[2015-01-12 11:14:41,609][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-01-12 11:14:44,615][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
在发现阶段(30秒后发生超时)。快速浏览the code(我不知道Java)似乎表明{none}
意味着ping失败。
我上面做的测试表明,从操作系统的角度来看 ,连接正常。
更新2 :以下是与上述事件相对应的tcpdump
(eu5
,想加入的机器为10.81.144.186
)
完整图片:http://i.stack.imgur.com/vLi7r.png
更新3 :我提交了bug report。
答案 0 :(得分:1)
配置中有错误,应该是
discovery.zen.ping.unicast.hosts
hosts
失踪