升级失败后,Kubernetes主站卡在NotReady上

时间:2020-03-25 21:38:17

标签: kubernetes

我的K8S群集的版本为1.13.2,我想升级到版本1.17.x(最新版本为1.17)。

我查看了官方说明:https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/,其中指出我需要一次升级一个未成年人,分别是1.14、1.15、1.16,然后才是1.17。

我做了所有准备工作(禁用掉期),由文档运行所有操作,确定最新的1.14是1.14.10。

当我跑步时:

apt-mark unhold kubeadm kubelet && \
 apt-get update && apt-get install -y kubeadm=1.14.10-00 && \
apt-mark hold kubeadm

由于某种原因,似乎也下载了kubectl v1.18。

我继续尝试运行sudo kubeadm upgrade plan,但失败并出现以下错误:

[perflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
    [ERROR ControlPlaneNodesReady]: there are Notready control-planes in the cluster: [<name of master>]
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

在运行kubectl get nodes时,它在VERSION下表示master确实是NotReady且版本为1.18.0,而worker当然是v1.13.2和Ready(不变)。

如何修复群集?

当我尝试升级时我做了什么错事?

1 个答案:

答案 0 :(得分:4)

我在实验室中重现了您的问题,然后发生的事情是您不小心进行了超出预期的升级。更具体地说,您在主节点(控制平面)中升级了kubelet程序包。

这是我的健康集群,版本为1.13.2

$ kubectl get nodes
NAME            STATUS   ROLES    AGE     VERSION
kubeadm-lab-0   Ready    master   9m25s   v1.13.2
kubeadm-lab-1   Ready    <none>   6m17s   v1.13.2
kubeadm-lab-2   Ready    <none>   6m9s    v1.13.2

现在,我将像您一样解除对kubeadmkubelet的所有权:

$ sudo apt-mark unhold kubeadm kubelet
Canceled hold on kubeadm.
Canceled hold on kubelet.

最后,我将kubeadm升级到1.14.1

$ sudo apt-get install kubeadm=1.14.10-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  conntrack kubelet kubernetes-cni
The following NEW packages will be installed:
  conntrack
The following packages will be upgraded:
  kubeadm kubelet kubernetes-cni
3 upgraded, 1 newly installed, 0 to remove and 8 not upgraded.
Need to get 34.1 MB of archives.
After this operation, 7,766 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:2 http://deb.debian.org/debian stretch/main amd64 conntrack amd64 1:1.4.4+snapshot20161117-5 [32.9 kB]
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.18.0-00 [19.4 MB]
Get:3 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.14.10-00 [8,155 kB]
Get:4 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.7.5-00 [6,473 kB]
Fetched 34.1 MB in 2s (13.6 MB/s)         
Selecting previously unselected package conntrack.
(Reading database ... 97656 files and directories currently installed.)
Preparing to unpack .../conntrack_1%3a1.4.4+snapshot20161117-5_amd64.deb ...
Unpacking conntrack (1:1.4.4+snapshot20161117-5) ...
Preparing to unpack .../kubelet_1.18.0-00_amd64.deb ...
Unpacking kubelet (1.18.0-00) over (1.13.2-00) ...
Preparing to unpack .../kubeadm_1.14.10-00_amd64.deb ...
Unpacking kubeadm (1.14.10-00) over (1.13.2-00) ...
Preparing to unpack .../kubernetes-cni_0.7.5-00_amd64.deb ...
Unpacking kubernetes-cni (0.7.5-00) over (0.6.0-00) ...
Setting up conntrack (1:1.4.4+snapshot20161117-5) ...
Setting up kubernetes-cni (0.7.5-00) ...
Setting up kubelet (1.18.0-00) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up kubeadm (1.14.10-00) ...

在此输出中可以看到,kubelet已更新为最新版本,因为它是kubeadm的依赖项。现在,我的主节点是NotReady,与您一样:

$ kubectl get nodes
NAME            STATUS     ROLES    AGE     VERSION
kubeadm-lab-0   NotReady   master   7m      v1.18.0
kubeadm-lab-1   Ready      <none>   3m52s   v1.13.2
kubeadm-lab-2   Ready      <none>   3m44s   v1.13.2

如何解决? 要解决这种情况,您必须降级一些错误升级的软件包:

$ sudo apt-get install -y \
--allow-downgrades \
--allow-change-held-packages \
kubelet=1.13.2-00 \
kubeadm=1.13.2-00 \
kubectl=1.13.2-00 \
kubernetes-cni=0.6.0-00

运行此命令后,请稍等片刻并检查您的节点:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE     VERSION
kubeadm-lab-0   Ready    master   9m25s   v1.13.2
kubeadm-lab-1   Ready    <none>   6m17s   v1.13.2
kubeadm-lab-2   Ready    <none>   6m9s    v1.13.2

如何成功升级?

在运行apt-get install之前,必须仔细检查其影响,并确保将软件包升级到所需的版本。

在集群中,我在主节点中使用以下命令升级了

$ sudo apt-mark unhold kubeadm kubelet && \
sudo apt-get update && \
sudo apt-get install -y kubeadm=1.14.10-00 kubelet=1.14.10-00 && \
sudo apt-mark hold kubeadm kubelet

我的主节点已升级到所需版本:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
kubeadm-lab-0   Ready    master   58m   v1.14.10
kubeadm-lab-1   Ready    <none>   55m   v1.13.2
kubeadm-lab-2   Ready    <none>   55m   v1.13.2

现在,如果您运行sudo kubeadm升级计划,我们将显示以下输出:

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.12
[upgrade/versions] kubeadm version: v1.14.10
I0326 10:08:44.926849   21406 version.go:240] remote version is much newer: v1.18.0; falling back to: stable-1.14
[upgrade/versions] Latest stable version: v1.14.10
[upgrade/versions] Latest version in the v1.13 series: v1.13.12

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT        AVAILABLE
Kubelet     2 x v1.13.2    v1.14.10
            1 x v1.14.10   v1.14.10

Upgrade to the latest stable version:

COMPONENT            CURRENT    AVAILABLE
API Server           v1.13.12   v1.14.10
Controller Manager   v1.13.12   v1.14.10
Scheduler            v1.13.12   v1.14.10
Kube Proxy           v1.13.12   v1.14.10
CoreDNS              1.2.6      1.3.1
Etcd                 3.2.24     3.3.10

You can now apply the upgrade by executing the following command:

    kubeadm upgrade apply v1.14.10

_____________________________________________________________________

您会在消息中看到,我们需要在所有节点上升级kubelet,因此我需要在其他2个节点上运行以下命令:

$ sudo apt-mark unhold kubeadm kubelet kubernetes-cni && \
sudo apt-get update && \
sudo apt-get install -y kubeadm=1.14.10-00 kubelet=1.14.10-00 && \
sudo apt-mark hold kubeadm kubelet kubernetes-cni

最后我继续:

$ sudo kubeadm upgrade apply v1.14.10
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.10". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.