在Kubernetes主服务器中添加已删除的etcd成员

时间:2019-06-09 13:42:37

标签: kubernetes etcd

我一直关注Kelsey Hightower的kubernetes-hard-way回购,并成功创建了一个包含3个主节点和3个工作节点的集群。这是删除etcd成员之一然后再将其添加回去时遇到的问题,以及所有使用的步骤:

3个主节点:
10.240.0.10控制器-0
10.240.0.11控制器-1
10.240.0.12控制器2

步骤1:

isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member list   --endpoints=https://127.0.0.1:2379   --cacert=/etc/etcd/ca.pem   --cert=/etc/etcd/kubernetes.pem   --key=/etc/etcd/kubernetes-key.pem

结果:

  

b28b52253c9d447e,已启动,控制器2,https://10.240.0.12:2380https://10.240.0.12:2379
  f98dc20bce6225a0,已启动,控制器-0,https://10.240.0.10:2380https://10.240.0.10:2379
  ffed16798470cab5,已启动,控制器1,https://10.240.0.11:2380https://10.240.0.11:2379

第2步(删除controller-2的etcd成员):

isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member remove b28b52253c9d447e   --endpoints=https://127.0.0.1:2379   --cacert=/etc/etcd/ca.pem   --cert=/etc/etcd/kubernetes.pem   --key=/etc/etcd/kubernetes-key.pem

第3步(将成员添加回去):

isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member add controller-2 --peer-urls=https://10.240.0.12:2380  --endpoints=https://127.0.0.1:2379   --cacert=/etc/etcd/ca.pem   --cert=/etc/etcd/kubernetes.pem   --key=/etc/etcd/kubernetes-key.pem

结果:

  

成员66d450d03498eb5c已添加到群集3e7cc799faffb625   ETCD_NAME =“ controller-2”   ETCD_INITIAL_CLUSTER =“ controller-2 = https://10.240.0.12:2380,controller-0=https://10.240.0.10:2380,controller-1=https://10.240.0.11:2380”“   ETCD_INITIAL_ADVERTISE_PEER_URLS =“ https://10.240.0.12:2380”   ETCD_INITIAL_CLUSTER_STATE =“ existing”

第4步(运行成员列表命令):

isaac@controller-0:~$ sudo ETCDCTL_API=3 etcdctl member list   --endpoints=https://127.0.0.1:2379   --cacert=/etc/etcd/ca.pem   --cert=/etc/etcd/kubernetes.pem   --key=/etc/etcd/kubernetes-key.pem

结果:

  

66d450d03498eb5c,未启动,, https://10.240.0.12:2380
  f98dc20bce6225a0,已启动,控制器-0,https://10.240.0.10:2380,   https://10.240.0.10:2379 ffed16798470cab5,已启动,控制器1,   https://10.240.0.11:2380https://10.240.0.11:2379

第5步(运行命令以在controller-2中启动etcd):

isaac@controller-2:~$ sudo etcd --name controller-2 --listen-client-urls https://10.240.0.12:2379,http://127.0.0.1:2379 --advertise-client-urls https://10.240.0.12:2379 --listen-peer-urls https://10.240.0.12:
2380 --initial-advertise-peer-urls https://10.240.0.12:2380 --initial-cluster-state existing --initial-cluster controller-0=http://10.240.0.10:2380,controller-1=http://10.240.0.11:2380,controller-2=http://10.240.0.1
2:2380 --ca-file /etc/etcd/ca.pem --cert-file /etc/etcd/kubernetes.pem --key-file /etc/etcd/kubernetes-key.pem

结果:

  

2019-06-09 13:10:14.958799 I | etcdmain:etcd版本:3.3.9   2019-06-09 13:10:14.959022 I | etcdmain:Git SHA:fca8add78   2019-06-09 13:10:14.959106 I | etcdmain:转到版本:go1.10.3   2019-06-09 13:10:14.959177 I | etcdmain:运行OS / Arch:linux / amd64   2019-06-09 13:10:14.959237我| etcdmain:将最大CPU数设置为1,可用CPU总数为1   2019-06-09 13:10:14.959312 W | etcdmain:未提供数据目录,使用默认的数据目录./controller-2.etcd   2019-06-09 13:10:14.959435 N | etcdmain:服务器之前已经初始化为成员,以etcd成员身份开始...   2019-06-09 13:10:14.959575 C | etcdmain:无法在TLS上侦听10.240.0.12:2380:未显示KeyFile和CertFile

很明显,etcd服务未按预期启动,因此我按以下方式进行故障排除:

isaac@controller-2:~$ sudo systemctl status etcd

结果:

  

●etcd.service-已加载etcd:已加载   (/etc/systemd/system/etcd.service;已启用;供应商预设:已启用)
  活动:自星期日2019-06-09 13:06:55 UTC起处于非活动状态(已死); 29分钟前        文件:https://github.com/coreos程序:1876 ExecStart = / usr / local / bin / etcd --name controller-2   --cert-file = / etc / etcd / kubernetes.pem --key-file = / etc / etcd / kubernetes-key.pem --peer-cert-file = / etc / etcd / kubernetes.pem --peer-密钥文件= / etc / etcd / kube主PID:1876(代码=已退出,状态= 0 /成功)6月9日13:06:55 controller-2 etcd [1876]:已停止   对等f98dc20bce6225a0 Jun 09 13:06:55 controller-2 etcd [1876]:   正在停止对等ffed16798470cab5 ... Jun 09 13:06:55 controller-2   etcd [1876]:与同伴ffed16798470cab5(作家)Jun停止流传输   09 13:06:55 controller-2 etcd [1876]:已停止与对等方流式传输   ffed16798470cab5(writer)Jun 09 13:06:55 controller-2 etcd [1876]:   与对等方ffed16798470cab5停止了HTTP流水线6月9日13:06:55   controller-2 etcd [1876]:与同伴ffed16798470cab5停止流传输   (流MsgApp v2阅读器)6月9日13:06:55 controller-2 etcd [1876]:   停止与同级ffed16798470cab5流式传输(流消息阅读器)   6月9日13:06:55 controller-2 etcd [1876]:已停止对等ffed16798470cab5   Jun 09 13:06:55 controller-2 etcd [1876]:找不到成员   集群3e7cc799faffb625中的f98dc20bce6225a0 Jun 09 13:06:55   controller-2 etcd [1876]:在systemd服务中忘记设置Type = notify   文件?

我确实尝试使用不同的命令启动etcd成员,但似乎controller-2的etcd仍然停留在未启动状态。我可以知道原因吗?任何指针将不胜感激!谢谢。

2 个答案:

答案 0 :(得分:1)

原来,我解决了以下问题(贷给Matthew):

  1. 使用以下命令删除etcd数据目录:
rm -rf  /var/lib/etcd/*
  1. 要修复消息cannot listen on TLS for 10.240.0.12:2380: KeyFile and CertFile are not presented,我修改了如下命令以启动etcd:
sudo etcd --name controller-2 --listen-client-urls https://10.240.0.12:2379,http://127.0.0.1:2379 --advertise-client-urls https://10.240.0.12:2379 --listen-peer-urls https://10.240.0.12:2380 --initial-advertise-peer-urls https://10.240.0.12:2380 --initial-cluster-state existing --initial-cluster controller-0=https://10.240.0.10:2380,controller-1=https://10.240.0.11:2380,controller-2=https://10.240.0.12:2380 --peer-trusted-ca-file  /etc/etcd/ca.pem --cert-file /etc/etcd/kubernetes.pem --key-file /etc/etcd/kubernetes-key.pem --peer-cert-file /etc/etcd/kubernetes.pem --peer-key-file /etc/etcd/kubernetes-key.pem --data-dir /var/lib/etcd

这里需要注意的几点:

  1. 新添加的参数--cert-file--key-file提供了controller2所需的密钥和证书。
  2. 还提供了参数--peer-trusted-ca-file,以便检查controller0controller1提交的x509证书是否由已知的CA签名。如果未显示,可能会遇到错误etcdserver: could not get cluster response from https://10.240.0.11:2380: Get https://10.240.0.11:2380/members: x509: certificate signed by unknown authority
  3. 为参数--initial-cluster显示的值必须与systemd单位文件中显示的值一致。

答案 1 :(得分:0)

如果您要重新添加更简单的解决方案

rm -rf  /var/lib/etcd/*
kubeadm join phase control-plane-join etcd --control-plane