我在VMware工作站v14中的CentOS v1.10.4
上创建了3个节点Kubernetes集群v0.10.0
和网络插件法兰绒7.5.1804
。它有效地工作了几天。但是今天当我启动机器时,我看到两个工作节点在启动后没有立即创建cni0
虚拟桥。
我尝试删除节点,重新加入,重新启动,所有这些尝试都无法解决问题。只有手动创建cni0
才能临时解决。但是,另一次重启会再次删除设置。
即使缺少cni0
,kubectl -n kube-system get pods
报告正常,但通过10.244.0.0/12
进行的实际节点间通信也不起作用。
cni包(v0.6.0
)安装在所有节点上:
# rpm -qa | grep cni
kubernetes-cni-0.6.0-0.x86_64
以下是初始化群集的命令:
# kubeadm init --apiserver-advertise-address 192.168.238.7 --kubernetes-version 1.10.4 --pod-network-cidr=10.244.0.0/16
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
# after worker1 re-joined cluster:
# kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
etcd-master 1/1 Running 0 2h 192.168.238.7 master
kube-apiserver-master 1/1 Running 0 2h 192.168.238.7 master
kube-controller-manager-master 1/1 Running 0 2h 192.168.238.7 master
kube-dns-86f4d74b45-cc2ph 3/3 Running 0 2h 10.244.0.8 master
kube-flannel-ds-fn6fx 1/1 Running 0 2h 192.168.238.7 master
kube-flannel-ds-h9qlf 1/1 Running 0 10m 192.168.238.8 worker1
kube-proxy-vjszc 1/1 Running 0 2h 192.168.238.7 master
kube-proxy-z2bcp 1/1 Running 0 10m 192.168.238.8 worker1
kube-scheduler-master 1/1 Running 0 2h 192.168.238.7 master
关于worker1:
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:36:b0:02 brd ff:ff:ff:ff:ff:ff
3: ens34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:36:b0:0c brd ff:ff:ff:ff:ff:ff
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:f0:ec:c8:cd brd ff:ff:ff:ff:ff:ff
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 02:ec:81:b6:ef:4d brd ff:ff:ff:ff:ff:ff
worker1上的路由表:
[root@worker1 ~]# ip ro
default via 192.168.64.2 dev ens34
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
169.254.0.0/16 dev ens33 scope link metric 1002
169.254.0.0/16 dev ens34 scope link metric 1003
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.64.0/24 dev ens34 proto kernel scope link src 192.168.64.135
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
192.168.238.0/24 dev ens33 proto kernel scope link src 192.168.238.8
docker版本为:v18.03
,对于docker守护程序禁用了iptables规则。
# cat /etc/docker/daemon.json
{
"iptables": false
}
问题是cni0
如何丢失并且永远不会通过重新启动或重新加入kubernetes群集重新创建?我有什么地方可以检查吗?
与此相关的一件事是kubernetes部署在VM中,因此我必须不时地打开/关闭它。但Kubernetes文档根本没有处理此类集群关闭操作的过程,除非拆除集群。是否有更优雅的方法来阻止群集以避免对群集完整性造成任何潜在损害?