我想用W10上的Vagrant创建一个K8s集群(1个主节点和2个从节点)。
启动主节点时出现问题。
我执行sudo kubeadm init
来启动主节点,但是命令失败。
"/etc/kubernetes/manifests/etcd.yaml" [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" [init] this might take a minute or longer if the control plane images have to be pulled
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker. Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID' couldn't initialize a Kubernetes cluster
我用systemctl status kubelet
检查kubelet是否正在运行:
● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf Active: active (running) since Mon 2018-11-05 13:55:48 UTC; 36min ago
Docs: https://kubernetes.io/docs/home/ Main PID: 24683 (kubelet)
Tasks: 18 (limit: 1135) CGroup: /system.slice/kubelet.service
└─24683 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-dr
Nov 05 14:32:07 master-node kubelet[24683]: E1105 14:32:07.605330 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:07 master-node kubelet[24683]: E1105 14:32:07.710945 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:07 master-node kubelet[24683]: W1105 14:32:07.801125 24683 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d Nov 05 14:32:07 master-node kubelet[24683]: E1105 14:32:07.804756 24683 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docke Nov 05 14:32:07 master-node kubelet[24683]: E1105 14:32:07.813349 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:07 master-node kubelet[24683]: E1105 14:32:07.916319 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:08 master-node kubelet[24683]: E1105 14:32:08.030146 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:08 master-node kubelet[24683]: E1105 14:32:08.136622 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:08 master-node kubelet[24683]: E1105 14:32:08.238376 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:08 master-node kubelet[24683]: E1105 14:32:08.340852 24683 kubelet.go:2236] node "master-node" not found
并且在我使用journalctl -xeu kubelet
检查日志之后:
Nov 05 14:32:39 master-node kubelet[24683]: E1105 14:32:39.328035 24683 kubelet.go:2236] node "master-node" not found Nov 05 14:32:39 master-node kubelet[24683]: E1105 14:32:39.632382 24683 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://10.0.2.15:6 Nov 05 14:32:39 master-node kubelet[24683]: E1105 14:32:39.657289 24683 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list
*v1.Pod: Get https://10.0.2. Nov 05 14:32:39 master-node kubelet[24683]: E1105 14:32:39.752441 24683 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://10.0.2.15:6443 Nov 05 14:32:39 master-node kubelet[24683]: I1105 14:32:39.804026 24683 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach Nov 05 14:32:39 master-node kubelet[24683]: I1105 14:32:39.835423 24683 kubelet_node_status.go:70] Attempting to register node master-node Nov 05 14:32:41 master-node kubelet[24683]: I1105 14:32:41.859955 24683 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach Nov 05 14:32:41 master-node kubelet[24683]: E1105 14:32:41.881897 24683 pod_workers.go:186] Error syncing pod e808f2bea99d167c3e91a819362a586b ("kube-apiserver-master-node_kube-system(e80
我不明白该错误。我应该在启动主节点之前启动CNI(如编织)吗?
您可以在这里找到我的vagrantfile,也许我忘记了一些东西:
Vagrant.configure("2") do |config| config.vm.box = "bento/ubuntu-18.04" config.vm.box_check_update = true config.vm.network "public_network" config.vm.hostname = "master-node" config.vm.provider :virtualbox do |vb|
vb.name = "master-node"
end
config.vm.provision "shell", inline: <<-SHELL
echo "UPDATE"
apt-get -y update
echo "INSTALL PREREQUIER"
apt-get install -y apt-transport-https ca-certificates curl software-properties-common
echo "START INSTALL DOCKER"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
apt-get -y update
apt-get install -y docker-ce
systemctl start docker
systemctl enable docker
usermod -aG docker vagrant
curl -L "https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname
-s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
chown vagrant /var/run/docker.sock
docker-compose --version
docker --version
echo "END INSTALL DOCKER"
echo "START INSTALL KUBENETES"
curl -s "https://packages.cloud.google.com/apt/doc/apt-key.gpg" | apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" >> /etc/apt/sources.list.d/kubernetes.list
apt-get -y update
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
apt-get install -y kubelet kubeadm kubectl
systemctl enable kubelet
systemctl start kubelet
echo "END INSTALL KUBENETES"
kubeadm config images pull #pre-download kubeadm config FOR MASTER ONLY
IPADDR=`hostname -I`
echo "This VM has IP address $IPADDR"
SHELL
end
如果我在发生错误后执行了docker ps -a,我可以看到两个kube-apiserver,其中一个已启动,另一个已退出。
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
befac2364452 51a9c329b7c5 "kube-apiserver --au…" 45 seconds ago Up 42 seconds k8s_kube-apiserver_kube-apiserver-master-node_kube-system_de7285496ca374bf069328c290f65db8_2
dab8889cada8 51a9c329b7c5 "kube-apiserver --au…" 3 minutes ago Exited (137) 46 seconds ago k8s_kube-apiserver_kube-apiserver-master-node_kube-system_de7285496ca374bf069328c290f65db8_1
87d74bdeb62b 3cab8e1b9802 "etcd --advertise-cl…" 5 minutes ago Up 5 minutes k8s_etcd_etcd-master-node_kube-system_2dba96180d17235a902e739497ef2f50_0
4d869d0be44f 15548c720a70 "kube-controller-man…" 5 minutes ago Up 5 minutes k8s_kube-controller-manager_kube-controller-manager-master-node_kube-system_7c81d10c743d19c292e161476cf2b945_0
1f72b9b636b4 d6d57c76136c "kube-scheduler --ad…" 5 minutes ago Up 5 minutes k8s_kube-scheduler_kube-scheduler-master-node_kube-system_ee7b1077c61516320f4273309e9b4690_0
6116a35a7ec7 k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_etcd-master-node_kube-system_2dba96180d17235a902e739497ef2f50_0
5de762296ece k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_kube-controller-manager-master-node_kube-system_7c81d10c743d19c292e161476cf2b945_0
156544886f28 k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_kube-scheduler-master-node_kube-system_ee7b1077c61516320f4273309e9b4690_0
1f6c396fc6e0 k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_kube-apiserver-master-node_kube-system_de7285496ca374bf069328c290f65db8_0
编辑: 如果我查看退出的k8s_kube-apiserver的日志,我会看到
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I1107 10:35:23.236063 1 server.go:681] external host was not specified, using 192.168.1.49
I1107 10:35:23.237046 1 server.go:152] Version: v1.12.2
I1107 10:35:42.690715 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I1107 10:35:42.691369 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I1107 10:35:42.705302 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I1107 10:35:42.709912 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I1107 10:35:59.955297 1 master.go:240] Using reconciler: lease
W1107 10:36:31.566656 1 genericapiserver.go:325] Skipping API batch/v2alpha1 because it has no resources.
W1107 10:36:41.454087 1 genericapiserver.go:325] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
W1107 10:36:41.655602 1 genericapiserver.go:325] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
W1107 10:36:42.148577 1 genericapiserver.go:325] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
W1107 10:36:59.451535 1 genericapiserver.go:325] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources.
[restful] 2018/11/07 10:37:00 log.go:33: [restful/swagger] listing is available at https://192.168.1.49:6443/swaggerapi
[restful] 2018/11/07 10:37:00 log.go:33: [restful/swagger] https://192.168.1.49:6443/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2018/11/07 10:37:37 log.go:33: [restful/swagger] listing is available at https://192.168.1.49:6443/swaggerapi
[restful] 2018/11/07 10:37:37 log.go:33: [restful/swagger] https://192.168.1.49:6443/swaggerui/ is mapped to folder /swagger-ui/
I1107 10:37:38.920238 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I1107 10:37:38.920985 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
我还注意到k8s_kube-apiserver容器在循环中启动和退出。
非常感谢!
答案 0 :(得分:1)
您的kubelet
正在运行,但看起来无法与API服务器通信。
我将检查VM:
docker ps | grep apiserver
您应该得到这样的东西:
$ docker ps | grep api
2f15a11f65f4 dcb029b5e3ad "kube-apiserver --au…" 2 weeks ago Up 2 weeks k8s_kube-apiserver_kube-apiserver-xxxx.internal_kube-system_acd8011fdf93688f6391aaca470a1fe8_2
8a1a5ce855aa k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-apiserver-xxxx.internal_kube-system_acd8011fdf93688f6391aaca470a1fe8_2
然后查看日志以查看是否出现任何故障:
$ docker logs 2f15a11f65f4
如果您没有看到kube-apiserver容器,则可能要尝试docker ps -a
,这意味着它有时会崩溃。
希望有帮助。