Etcd群集设置失败

时间:2016-06-16 12:20:27

标签: docker etcd

我正在尝试在Ubuntu机器上设置3个节点的etcd集群作为网络的docker数据存储。我使用etcd docker image成功创建了etcd集群。现在,当我尝试复制它时,步骤在一个节点上失败。即使从升级中删除故障节点,群集仍在查找已删除的节点。当我使用etcd二进制文件时,会遇到同样的错误。

通过在所有节点上相应地更改ip来使用以下命令:

docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
 --name etcd quay.io/coreos/etcd \
 -name etcd0 \
 -advertise-client-urls http://172.27.59.141:2379,http://172.27.59.141:4001 \
 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
 -initial-advertise-peer-urls http://172.27.59.141:2380 \
 -listen-peer-urls http://0.0.0.0:2380 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://172.27.59.141:2380,etcd1=http://172.27.59.244:2380,etcd2=http://172.27.59.232:2380 \
 -initial-cluster-state new

两个节点正常连接但第三个节点的服务停止。以下是第三个节点的日志。

2016-06-16 17:16:34.293248 I | etcdmain: etcd Version: 2.3.6
2016-06-16 17:16:34.294368 I | etcdmain: Git SHA: 128344c
2016-06-16 17:16:34.294584 I | etcdmain: Go Version: go1.6.2
2016-06-16 17:16:34.294781 I | etcdmain: Go OS/Arch: linux/amd64
2016-06-16 17:16:34.294962 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-06-16 17:16:34.295142 W | etcdmain: no data-dir provided, using default data-dir ./node2.etcd
2016-06-16 17:16:34.295438 I | etcdmain: listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.295654 I | etcdmain: listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.295846 I | etcdmain: listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.296193 I | etcdmain: stopping listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.301139 I | etcdmain: stopping listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.301454 I | etcdmain: stopping listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.301718 I | etcdmain: --initial-cluster must include node2=http://172.27.59.232:2380 given --initial-advertise-peer-urls=http://172.27.59.232:2380

即使在删除失败的节点后,我也可以看到两个节点正在等待第三个节点连接。

2016-06-16 17:16:12.063893 N | etcdserver: added member 17879927ec74147b [http://172.27.59.232:238] to cluster ba4424e006edb53e
2016-06-16 17:16:12.064431 N | etcdserver: added local member 24d9feabb7e2f26f [http://172.27.59.244:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.065229 N | etcdserver: added member 2bda70be57138cfe [http://172.27.59.141:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.218560 I | raft: 24d9feabb7e2f26f [term: 1] received a MsgVote message with higher term from 2bda70be57138cfe [term: 29]
2016-06-16 17:16:12.218964 I | raft: 24d9feabb7e2f26f became follower at term 29
2016-06-16 17:16:12.219276 I | raft: 24d9feabb7e2f26f [logterm: 1, index: 3, vote: 0] voted for 2bda70be57138cfe [logterm: 1, index: 3] at term 29
2016-06-16 17:16:12.222667 I | raft: raft.node: 24d9feabb7e2f26f elected leader 2bda70be57138cfe at term 29
2016-06-16 17:16:12.335904 I | etcdserver: published {Name:node1 ClientURLs:[http://172.27.59.244:2379 http://172.27.59.244:4001]} to cluster ba4424e006edb53e
2016-06-16 17:16:12.336459 N | etcdserver: set the initial cluster version to 2.2
2016-06-16 17:16:42.059177 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:12.060313 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:42.060986 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy

可以看出,尽管启动了具有两个节点的集群,但它仍在搜索第三个节点。

本地磁盘上是否存在保存数据的位置,并且尽管未提供数据,但仍会拾取旧数据。

请建议我缺少的东西。

1 个答案:

答案 0 :(得分:3)

  

本地磁盘上是否存在保存数据的位置,以及在未提供数据的情况下拾取旧数据。

是的,会员资料已存储在node0.etcdnode1.etcd

您可以从日志中获取以下消息,指示服务器已属于群集:

etcdmain: the server is already initialized as member before, starting as etcd member...

要运行包含两个成员的新群集,只需在命令中添加另一个参数:

--data-dir bak