K8s节点未进入“未就绪”状态

时间:2018-06-08 08:52:02

标签: kubernetes

我注意到节点失败并进入 Not Ready 状态2天。

我有一个3节点的k8s群集。

dc2-k8s-01   Ready     master,node   2d        v1.10.2
dc2-k8s-02   Ready     master,node   2d        v1.10.2
dc2-k8s-03   Ready     node          2d        v1.10.2

dc2-k8s-02节点失败了。

在节点失败时,我看到许多状态被驱逐未知。 任何人都可以帮助找到它发生的原因吗? 下面我从失败的节点附加了一些syslog。 我会添加更多日志,但我的声誉很低。

kubectl get pod --all-namespaces
NAMESPACE       NAME                                    READY     STATUS     RESTARTS   AGE
clilent-ci        rabbitmq-client-ci-01-0                  1/1       Unknown    0          18h
client-dev       rabbitmq-client-dev-01-0                 1/1       Unknown    4          18h
client-sandbox   rabbitmq-client-sandbox-01-0             1/1       Unknown    4          18h
client-test      rabbitmq-client-test-01-0                1/1       Unknown    5          18h
default         es-master-6c6d546445-2hn2h              0/1       Evicted    0          7h
default         es-master-6c6d546445-55pjj              1/1       Running    0          18h
default         es-master-6c6d546445-7cv66              0/1       Evicted    0          18h
default         es-master-6c6d546445-b4hr2              0/1       Evicted    0          7h
default         es-master-6c6d546445-dk765              1/1       Running    0          18h
default         es-master-6c6d546445-dljwk              0/1       Evicted    0          7h
default         es-master-6c6d546445-dmks2              0/1       Evicted    0          7h
default         es-master-6c6d546445-gftj9              1/1       Unknown    0          18h
default         es-master-6c6d546445-gpx2t              0/1       Evicted    0          18h
default         es-master-6c6d546445-hqd65              0/1       Evicted    0          18h
default         es-master-6c6d546445-j4m7c              0/1       Evicted    0          7h
default         es-master-6c6d546445-jtg6m              0/1       Evicted    0          18h
default         es-master-6c6d546445-jthb5              0/1       Evicted    0          7h
default         es-master-6c6d546445-jz5h9              0/1       Evicted    0          7h
default         es-master-6c6d546445-k22tq              0/1       Evicted    0          18h
default         es-master-6c6d546445-k59d9              0/1       Evicted    0          18h
default         es-master-6c6d546445-njfq6              0/1       Evicted    0          18h
default         es-master-6c6d546445-njr2r              0/1       Evicted    0          7h
default         es-master-6c6d546445-p528v              0/1       Evicted    0          7h
default         es-master-6c6d546445-p8t9w              0/1       Evicted    0          7h
default         es-master-6c6d546445-qg56b              0/1       Evicted    0          7h
default         es-master-6c6d546445-r7rmv              0/1       Evicted    0          7h
default         es-master-6c6d546445-twf76              0/1       Evicted    0          7h
default         es-master-6c6d546445-v7cp7              0/1       Evicted    0          7h
default         es-master-6c6d546445-x6sk7              0/1       Evicted    0          7h
default         es-master-6c6d546445-x7n6k              0/1       Evicted    0          7h
default         es-master-6c6d546445-xqtm7              0/1       Evicted    0          7h
default         es-master-6c6d546445-z4mwr              1/1       Running    0          7h
kube-system     calico-node-2hg88                       1/1       Running    0          7h
kube-system     calico-node-48cm9                       1/1       NodeLost   1          2d
kube-system     calico-node-j8hcj                       1/1       Running    2          2d
kube-system     kube-apiserver-dc2-k8s-01               1/1       Running    2          2d
kube-system     kube-apiserver-dc2-k8s-02               1/1       Unknown    0          18h
kube-system     kube-controller-manager-dc2-k8s-01      1/1       Running    2          2d
kube-system     kube-controller-manager-dc2-k8s-02      1/1       Unknown    1          18h
kube-system     kube-dns-7bd4d5fbb6-bhfh4               3/3       Running    3          2d
kube-system     kube-dns-7bd4d5fbb6-fz6xd               3/3       Running    3          2d
kube-system     kube-proxy-dc2-k8s-01                   1/1       Running    2          2d
kube-system     kube-proxy-dc2-k8s-02                   1/1       Unknown    1          18h
kube-system     kube-proxy-dc2-k8s-03                   1/1       Running    1          2d
kube-system     kube-scheduler-dc2-k8s-01               1/1       Running    2          2d
kube-system     kube-scheduler-dc2-k8s-02               1/1       Unknown    1          18h
kube-system     kubedns-autoscaler-679b8b455-ltkrd      1/1       Running    0          19h
kube-system     kubernetes-dashboard-55fdfd74b4-57lrn   1/1       Running    0          19h
kube-system     nginx-proxy-dc2-k8s-03                  1/1       Running    1          2d
kube-system     tiller-deploy-5c688d5f9b-fwnfc          1/1       Running    0          1d
Jun  8 01:31:08 dc2-k8s-02 kubelet[12922]: I0608 01:31:08.501110   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:31:18 dc2-k8s-02 kubelet[12922]: I0608 01:31:18.513356   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:31:19 dc2-k8s-02 systemd[1]: Starting Cleanup of Temporary Directories...
Jun  8 01:31:19 dc2-k8s-02 systemd-tmpfiles[55561]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Jun  8 01:31:19 dc2-k8s-02 systemd[1]: Started Cleanup of Temporary Directories.
Jun  8 01:31:28 dc2-k8s-02 kubelet[12922]: I0608 01:31:28.522696   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:31:33 dc2-k8s-02 kubelet[12922]: I0608 01:31:33.753754   12922 container_manager_linux.go:427] [ContainerManager]: Discovered runtime cgroups name: /system.sl
ice/docker.service
Jun  8 01:31:38 dc2-k8s-02 kubelet[12922]: I0608 01:31:38.536068   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:31:48 dc2-k8s-02 kubelet[12922]: I0608 01:31:48.546811   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:31:58 dc2-k8s-02 kubelet[12922]: I0608 01:31:58.565057   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:08 dc2-k8s-02 kubelet[12922]: I0608 01:32:08.592797   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:18 dc2-k8s-02 kubelet[12922]: I0608 01:32:18.603399   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:28 dc2-k8s-02 kubelet[12922]: I0608 01:32:28.613465   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:38 dc2-k8s-02 kubelet[12922]: I0608 01:32:38.623522   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:48 dc2-k8s-02 etcd[1844]: 2018-06-07 23:32:48.285141 I | wal: segmented wal file /var/lib/etcd/member/wal/0000000000000006-000000000005a524.wal is created
Jun  8 01:32:48 dc2-k8s-02 kubelet[12922]: I0608 01:32:48.632236   12922 kubelet_node_status.go:434] Using node IP: "10.19.10.81"
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: W0608 01:32:53.296114   12922 eviction_manager.go:343] eviction manager: attempting to reclaim imagefs
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.296145   12922 container_gc.go:85] attempting to delete unused containers
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.303862   12922 image_gc_manager.go:317] attempting to delete unused images
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.311032   12922 image_gc_manager.go:371] [imageGCManager]: Removing image "sha256:1b3607c7eda67632301ee13b353b4
16a91e62ae72df7ac617e84c8a586915772" to free 37293884 bytes
Jun  8 01:32:53 dc2-k8s-02 dockerd[1443]: time="2018-06-08T01:32:53.312129151+02:00" level=error msg="Handler for DELETE /v1.27/images/sha256:1b3607c7eda67632301ee13b35
3b416a91e62ae72df7ac617e84c8a586915772 returned error: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running container 45a06762830
6"
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.312300   12922 remote_image.go:130] RemoveImage "sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772" from image service failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running container 45a067628306
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.312332   12922 kuberuntime_image.go:122] Remove image "sha256:1b3607c7eda67632301ee13b353b416a91e62ae72df7ac617e84c8a586915772" failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running container 45a067628306
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: W0608 01:32:53.312347   12922 eviction_manager.go:445] eviction manager: unexpected error when attempting to reduce imagefs pressure: wanted to free 9223372036854775807 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 1b3607c7eda6 (cannot be forced) - image is being used by running container 45a067628306
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.319390   12922 eviction_manager.go:357] eviction manager: must evict pod(s) to reclaim imagefs
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.319705   12922 eviction_manager.go:375] eviction manager: pods ranked for eviction: kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7), es-master-6c6d546445-gftj9_default(f9fc1fd8-6a55-11e8-9eb0-005056011c88), calico-node-48cm9_kube-system(96896bf6-690e-11e8-bfc3-005056011c89), rabbitmq-client-test-01-0_client-test(771cf50c-6a4f-11e8-9eb0-005056011c88), rabbitmq-client-ci-01-0_client-ci(0289b86d-6a53-11e8-9eb0-005056011c88), rabbitmq-client-sandbox-01-0_client-sandbox(77d424e2-6a4f-11e8-9eb0-005056011c88), rabbitmq-client-dev-01-0_client-dev(76912724-6a4f-11e8-9eb0-005056011c88), kube-controller-manager-dc2-k8s-02_kube-system(2a11d956c9572dd61b88505193e7645b), kube-scheduler-dc2-k8s-02_kube-system(ecf642f13297d7d4ed8452208442bdf1), kube-proxy-dc2-k8s-02_kube-system(df3867f1089ebd71c0ffd15708a31f0a)
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.319821   12922 kuberuntime_container.go:547] Killing container "docker://488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931" with 30 second grace period
Jun  8 01:32:53 dc2-k8s-02 dockerd[1443]: time="2018-06-08T01:32:53.320576513+02:00" level=info msg="Container 488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931 failed to exit within 0 seconds of signal 15 - using the force"
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.319821   12922 kuberuntime_container.go:547] Killing container "docker://488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931" with 30 second grace period
Jun  8 01:32:53 dc2-k8s-02 dockerd[1443]: time="2018-06-08T01:32:53.320576513+02:00" level=info msg="Container 488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931 failed to exit within 0 seconds of signal 15 - using the force"
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.368565   12922 kubelet_pods.go:1121] Killing unwanted pod "kube-apiserver-dc2-k8s-02"
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.371237   12922 kuberuntime_container.go:547] Killing container "docker://488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931" with 30 second grace period
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: W0608 01:32:53.385514   12922 status_manager.go:461] Failed to get status for pod "kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7)": Get https://127.0.0.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-dc2-k8s-02: unexpected EOF
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.385546   12922 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.386504   12922 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to watch *v1.Node: Get https://127.0.0.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Ddc2-k8s-02&resourceVersion=290288&timeoutSeconds=507&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.386540   12922 event.go:209] Unable to write event: 'Post https://127.0.0.1:6443/api/v1/namespaces/kube-system/events: unexpected EOF; some request body already written' (may retry after sleeping)
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.386553   12922 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.386681   12922 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.386920   12922 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to watch *v1.Pod: Get https://127.0.0.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Ddc2-k8s-02&resourceVersion=234367&timeoutSeconds=471&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.386955   12922 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to watch *v1.Service: Get https://127.0.0.1:6443/api/v1/services?resourceVersion=229986&timeoutSeconds=586&watch=true: dial tcp 127.0.0.1:6443: getsockopt: connection refused
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: E0608 01:32:53.426374   12922 kuberuntime_container.go:65] Can't make a ref to pod "kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7)", container kube-apiserver: selfLink was empty, can't make reference
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.521670   12922 eviction_manager.go:157] eviction manager: pods kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7) evicted, waiting for pod to be cleaned up
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.818261   12922 kubelet.go:1901] SyncLoop (PLEG): "kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7)", event: &pleg.PodLifecycleEvent{ID:"724c3c291ba23d6b4714f3172ed25cd7", Type:"ContainerDied", Data:"488f9e9f3a84d66be96dd324f73eb84445e0086f1a0ee54c0885e76e5d87e931"}
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: I0608 01:32:53.818336   12922 kubelet.go:1901] SyncLoop (PLEG): "kube-apiserver-dc2-k8s-02_kube-system(724c3c291ba23d6b4714f3172ed25cd7)", event: &pleg.PodLifecycleEvent{ID:"724c3c291ba23d6b4714f3172ed25cd7", Type:"ContainerDied", Data:"5b03e2687193cd797045fbb212d32373e1017eac2d8a984aedad243da1f4012c"}
Jun  8 01:32:53 dc2-k8s-02 kubelet[12922]: W0608 01:32:53.818365   12922 pod_container_deletor.go:77] Container "5b03e2687193cd797045fbb212d32373e1017eac2d8a984aedad243da1f4012c" not found in pod's containers
Jun  8 01:32:54 dc2-k8s-02 kubelet[12922]: E0608 01:32:54.386844   12922 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://127.0.0.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Ddc2-k8s-02&limit=500&resourceVersion=0: dial tcp 

1 个答案:

答案 0 :(得分:0)

  

任何人都可以帮忙找出原因发生的原因吗?

你肯定有一个失败的节点。可能有几个原因,根据您的问题,这里只是我们遇到的一些首选:

  • 节点上的CPU /内存/磁盘压力,因此调度程序决定驱逐
  • 交换文件问题(我猜它已被禁用),节点在超载时冻结,然后重新启动
  • 网络问题,节点在一段时间内不可用
  • HA群集/污点配置错误(您有两个主人)

为了更加具体而言,您需要提供有关系统的更多信息,例如污点,节点大小,可能是kubectl decribe node dc2-k8s-02的输出,也许来​​自日程表的日志可以更好地解决您的问题: kubectl -n kube-system logs kube-scheduler-dc2-k8s-02(以及kubectl -n kube-system logs kube-scheduler-dc2-k8s-01)。