Question

我使用CentOS最低ISO在GNOME-Boxes中创建了一个本地3节点kubernetes集群。这是为了在客户端配置的计算机上测试自定义安装。一切都进行得很顺利，而且几天来我什至一切顺利。但是，我需要重新启动服务器，因此我通过在群集中每个节点上运行的shutdown now命令将k8s群集关闭。当我备份所有内容时，群集未按预期备份。日志告诉我，调出apiserver和etcd图像时出现问题。 apiserver的docker日志向我显示以下内容：

Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0919 03:05:10.238042       1 server.go:703] external host was not specified, using 192.168.122.2
I0919 03:05:10.238160       1 server.go:145] Version: v1.11.3
Error: unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied
...[cli params for kube-apiserver]
    error: unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied

当我检查权限时，它设置为644，并且文件肯定在那里。我真正的问题是，当我使用kubeadm初始化集群，然后无法正确重启时，为什么它能正常工作？

以下是我用来初始化集群的步骤：

# NOTE: this file needs to be run as root
#  1: install kubelet, kubeadm, kubectl, and docker
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF
yum install -y kubelet kubeadm kubectl docker --disableexcludes=kubernetes
systemctl enable --now kubelet
systemctl enable --now docker

# 2: disable enforcement of SELinux policies (k8s has own policies)
sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/sysconfig/selinux
setenforce 0

# 3: make sure the network can function properly
sysctl net.bridge.bridge-nf-call-iptables=1

# 4. insert all necessary modules
modprobe --all ip_vs_wrr ip_vs_sh ip_vs ip_vs_rr
cat <<EOF > /etc/modules-load.d/ifvs.conf
ip_vs_wrr
ip_vs_sh
ip_vs
ip_vs_rr
EOF
systemctl disable --now firewalld

# 5: initialize the cluster. this should happen only on the master node. This will print out instructions and a command that should be run on each supporting node.
kubeadm init --pod-network-cidr=10.244.0.0/16 

# 6: run the kubeadm join command from result of step 5 on all the other nodes
kubeadm join 192.168.122.91:6443 --token jvr7dh.ymoahxxhu3nig8kl --discovery-token-ca-cert-hash sha256:7cc1211aa882c535f371e2cf6706072600f2cc47b7da18b1d242945c2d8cab65

#################################
# the cluster is  all setup to be accessed via API. use kubectl on your local machine from here on out!
# to access the cluster via kubectl, you need to merge the contents of <master_node>:/etc/kubernetes/admin.conf with your local ~/.kube/config
#################################

# 7: to allow the master to run pods: 
kubectl taint nodes --all node-role.kubernetes.io/master-

# 8: install the networking node: 
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

# 10: setup dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

# 11: set admin user (for dashboard)
kubectl apply -f deploy/admin-user.yaml
# copy the token into
TOKEN=$(kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}') | grep token:)

# start proxy on local machine to cluster
kubectl proxy &

# go to the dashboard in your browser
open http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

# paste the token into the login:
echo $TOKEN

Answer 1

在使用kubeadm创建我的Kubernetes单主机后，我在Centos 7 Virtual Box上遇到了完全相同的问题，最后我针对kubeadm创建了issue。

您可能要按照我和在调试问题期间支持我的人提到的部分或全部步骤进行操作。总而言之，对我有用的是将主机名设置为localhost或类似的名称，然后尝试使用kubeadm init再次创建我的集群。（看到这个 link关于我对此问题的最后评论，以查找解决我的问题的确切步骤）。我已经能够运行kubernetes集群，并且还可以将其他节点成功加入该更改。好运

Answer 2

我想我可能已经找到了解决方案。我必须向主节点上的pki目录授予写权限。

chmod a+rw -R /etc/kubernetes/pki

我仍然不明白为什么它起作用，但是它似乎可重复地起作用。

如何正确启动以前关闭的kubernetes集群？

2 个答案: