我一直关注Kelsey Hightower的kubernetes-hard-way回购,并成功创建了一个包含3个主节点和3个工作节点的集群。这是删除etcd成员之一然后再将其添加回去时遇到的问题,以及所有使用的步骤:
3个主节点:
10.240.0.10控制器-0
10.240.0.11控制器-1
10.240.0.12控制器2
步骤1:
isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member list --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.pem --cert=/etc/etcd/kubernetes.pem --key=/etc/etcd/kubernetes-key.pem
结果:
b28b52253c9d447e,已启动,控制器2,https://10.240.0.12:2380,https://10.240.0.12:2379
f98dc20bce6225a0,已启动,控制器-0,https://10.240.0.10:2380,https://10.240.0.10:2379
ffed16798470cab5,已启动,控制器1,https://10.240.0.11:2380,https://10.240.0.11:2379
第2步(删除controller-2的etcd成员):
isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member remove b28b52253c9d447e --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.pem --cert=/etc/etcd/kubernetes.pem --key=/etc/etcd/kubernetes-key.pem
第3步(将成员添加回去):
isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member add controller-2 --peer-urls=https://10.240.0.12:2380 --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.pem --cert=/etc/etcd/kubernetes.pem --key=/etc/etcd/kubernetes-key.pem
结果:
成员66d450d03498eb5c已添加到群集3e7cc799faffb625 ETCD_NAME =“ controller-2” ETCD_INITIAL_CLUSTER =“ controller-2 = https://10.240.0.12:2380,controller-0=https://10.240.0.10:2380,controller-1=https://10.240.0.11:2380”“ ETCD_INITIAL_ADVERTISE_PEER_URLS =“ https://10.240.0.12:2380” ETCD_INITIAL_CLUSTER_STATE =“ existing”
第4步(运行成员列表命令):
isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member list --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.pem --cert=/etc/etcd/kubernetes.pem --key=/etc/etcd/kubernetes-key.pem
结果:
66d450d03498eb5c,未启动,, https://10.240.0.12:2380,
f98dc20bce6225a0,已启动,控制器-0,https://10.240.0.10:2380, https://10.240.0.10:2379 ffed16798470cab5,已启动,控制器1, https://10.240.0.11:2380,https://10.240.0.11:2379
第5步(运行命令以在controller-2中启动etcd):
isaac@controller-2:~$ sudo etcd --name controller-2 --listen-client-urls https://10.240.0.12:2379,http://127.0.0.1:2379 --advertise-client-urls https://10.240.0.12:2379 --listen-peer-urls https://10.240.0.12:
2380 --initial-advertise-peer-urls https://10.240.0.12:2380 --initial-cluster-state existing --initial-cluster controller-0=http://10.240.0.10:2380,controller-1=http://10.240.0.11:2380,controller-2=http://10.240.0.1
2:2380 --ca-file /etc/etcd/ca.pem --cert-file /etc/etcd/kubernetes.pem --key-file /etc/etcd/kubernetes-key.pem
结果:
2019-06-09 13:10:14.958799 I | etcdmain:etcd版本:3.3.9 2019-06-09 13:10:14.959022 I | etcdmain:Git SHA:fca8add78 2019-06-09 13:10:14.959106 I | etcdmain:转到版本:go1.10.3 2019-06-09 13:10:14.959177 I | etcdmain:运行OS / Arch:linux / amd64 2019-06-09 13:10:14.959237我| etcdmain:将最大CPU数设置为1,可用CPU总数为1 2019-06-09 13:10:14.959312 W | etcdmain:未提供数据目录,使用默认的数据目录./controller-2.etcd 2019-06-09 13:10:14.959435 N | etcdmain:服务器之前已经初始化为成员,以etcd成员身份开始... 2019-06-09 13:10:14.959575 C | etcdmain:无法在TLS上侦听10.240.0.12:2380:未显示KeyFile和CertFile
很明显,etcd服务未按预期启动,因此我按以下方式进行故障排除:
isaac@controller-2:~$ sudo systemctl status etcd
结果:
●etcd.service-已加载etcd:已加载 (/etc/systemd/system/etcd.service;已启用;供应商预设:已启用)
活动:自星期日2019-06-09 13:06:55 UTC起处于非活动状态(已死); 29分钟前 文件:https://github.com/coreos程序:1876 ExecStart = / usr / local / bin / etcd --name controller-2 --cert-file = / etc / etcd / kubernetes.pem --key-file = / etc / etcd / kubernetes-key.pem --peer-cert-file = / etc / etcd / kubernetes.pem --peer-密钥文件= / etc / etcd / kube主PID:1876(代码=已退出,状态= 0 /成功)6月9日13:06:55 controller-2 etcd [1876]:已停止 对等f98dc20bce6225a0 Jun 09 13:06:55 controller-2 etcd [1876]: 正在停止对等ffed16798470cab5 ... Jun 09 13:06:55 controller-2 etcd [1876]:与同伴ffed16798470cab5(作家)Jun停止流传输 09 13:06:55 controller-2 etcd [1876]:已停止与对等方流式传输 ffed16798470cab5(writer)Jun 09 13:06:55 controller-2 etcd [1876]: 与对等方ffed16798470cab5停止了HTTP流水线6月9日13:06:55 controller-2 etcd [1876]:与同伴ffed16798470cab5停止流传输 (流MsgApp v2阅读器)6月9日13:06:55 controller-2 etcd [1876]: 停止与同级ffed16798470cab5流式传输(流消息阅读器) 6月9日13:06:55 controller-2 etcd [1876]:已停止对等ffed16798470cab5 Jun 09 13:06:55 controller-2 etcd [1876]:找不到成员 集群3e7cc799faffb625中的f98dc20bce6225a0 Jun 09 13:06:55 controller-2 etcd [1876]:在systemd服务中忘记设置Type = notify 文件?
我确实尝试使用不同的命令启动etcd成员,但似乎controller-2的etcd仍然停留在未启动状态。我可以知道原因吗?任何指针将不胜感激!谢谢。
答案 0 :(得分:1)
原来,我解决了以下问题(贷给Matthew):
rm -rf /var/lib/etcd/*
cannot listen on TLS for 10.240.0.12:2380: KeyFile and CertFile are not presented
,我修改了如下命令以启动etcd:sudo etcd --name controller-2 --listen-client-urls https://10.240.0.12:2379,http://127.0.0.1:2379 --advertise-client-urls https://10.240.0.12:2379 --listen-peer-urls https://10.240.0.12:2380 --initial-advertise-peer-urls https://10.240.0.12:2380 --initial-cluster-state existing --initial-cluster controller-0=https://10.240.0.10:2380,controller-1=https://10.240.0.11:2380,controller-2=https://10.240.0.12:2380 --peer-trusted-ca-file /etc/etcd/ca.pem --cert-file /etc/etcd/kubernetes.pem --key-file /etc/etcd/kubernetes-key.pem --peer-cert-file /etc/etcd/kubernetes.pem --peer-key-file /etc/etcd/kubernetes-key.pem --data-dir /var/lib/etcd
这里需要注意的几点:
--cert-file
和--key-file
提供了controller2
所需的密钥和证书。 --peer-trusted-ca-file
,以便检查controller0
和controller1
提交的x509证书是否由已知的CA签名。如果未显示,可能会遇到错误etcdserver: could not get cluster response from https://10.240.0.11:2380: Get https://10.240.0.11:2380/members: x509: certificate signed by unknown authority
。 --initial-cluster
显示的值必须与systemd单位文件中显示的值一致。答案 1 :(得分:0)
如果您要重新添加更简单的解决方案
rm -rf /var/lib/etcd/*
kubeadm join phase control-plane-join etcd --control-plane