如何修复“由于某种原因节点配置错误(禁用的cgroups),kubelet不健康”错误

时间:2019-01-29 15:21:35

标签: kubernetes

我正在使用以下规格的本地PC设置新的kubernetes设置。在尝试启动Kubernetes集群时,我遇到了一些问题。需要您的输入。

操作系统版本:Linux server.cent.com 3.10.0-123.el7.x86_64#1 SMP Mon Jun 3在此处输入代码0 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU / Linux

Docker版本:Docker版本1.13.1,内部版本07f3374 / 1.13.1

[root@server ~]# rpm -qa |grep -i kube
kubectl-1.13.2-0.x86_64
kubernetes-cni-0.6.0-0.x86_64
kubeadm-1.13.2-0.x86_64
kubelet-1.13.2-0.x86_64

面临的问题是:

[root@server ~]# kubeadm init --apiserver-advertise-address=192.168.203.154 --pod-network-cidr=10.244.0.0/16
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
        - 'docker ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

小牛状态:

Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.354902   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.456166   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.558500   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.660833   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.763840   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.867118   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:09 server.cent.com kubelet[10994]: E0129 09:34:09.968783   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.071722   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.173396   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.274892   10994 kubelet.go:2266] node "server.cent.com" not found
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.292021   10994 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.328447   10994 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubeenter code herelet.go:453: Failed to list *v1.Node: Get https://192.168.20?
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.3`  `29742   10994 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168
Jan 29 09:34:10 server.cent.com kubelet[10994]: E0129 09:34:10.376238   10994 kubelet.go:2266] node "server.cent.com" not found

我在所有这些版本中都尝试过相同的操作,但是存在相同的问题:1.13.2、1.12.0、1.11.0、1.10.0和1.9.0

2 个答案:

答案 0 :(得分:0)

根据您的输出,似乎kubelet服务无法建立与Kubernetes api服务器的连接,因此它在安装过程中未通过运行状况检查。原因可能有所不同,但是我建议您擦除当前的kubeadm设置并从头开始安装,这是一个很好的教程,您可以在类似的case中找到该教程,甚至可以遵循官方的Kubernetes kubeadm安装guidelines

出于调查目的,您可以使用Kubeadm故障排除guide

如果您对安装步骤或任何其他相关问题有疑问,只需在此答案下方写上注释。

答案 1 :(得分:0)

在Fedora Core OS上安装k8s时遇到了这个问题。然后我做到了

cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

systemctl daemon-reload
systemctl restart docker

请参阅:https://kubernetes.io/docs/setup/production-environment/container-runtimes/

然后docker重新启动失败,我通过创建一个具有以下内容的新文件/etc/systemd/system/docker.service.d/docker.conf克服了这一点

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd

请参阅:https://docs.docker.com/config/daemon/

之后,一切都很好,并能够设置k8s集群。