我有一个群集,最近节点上的可用内存下降到%5。发生这种情况时,节点CPU(负载)会在试图从缓存/缓冲区释放一些内存的时候达到峰值。高负载,低内存的结果之一是,有时我最终会遇到Pod,这些Pod进入错误状态或陷入终止状态。这些Pod会一直待在我手动干预之前,这会进一步加剧导致它的内存不足问题。
我的问题是Kubernetes为什么将这些Pod留在这种状态下?我的直觉是,kubernetes没有从Docker守护程序获得正确的反馈,并且再也不会尝试。我需要知道如何清理或修复Kubernetes错误和终止Pod。有什么想法吗?
我目前在:
~ # kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
更新: 以下是窗格中列出的一些事件。您会看到其中一些坐了几天。您还将看到一个显示警告,而其他显示正常。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedKillPod 25m kubelet, k8s-node-0 error killing pod: failed to "KillContainer" for "kubectl" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded"
Normal Killing 20m (x2482 over 3d) kubelet, k8s-node-0 Killing container with id docker://docker:Need to kill Pod
Normal Killing 15m (x2484 over 3d) kubelet, k8s-node-0 Killing container with id docker://maven:Need to kill Pod
Normal Killing 8m (x2487 over 3d) kubelet, k8s-node-0 Killing container with id docker://node:Need to kill Pod
Normal Killing 4m (x2489 over 3d) kubelet, k8s-node-0 Killing container with id docker://jnlp:Need to kill Pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 56m (x125 over 5h) kubelet, k8s-node-2 Killing container with id docker://owasp-zap:Need to kill Pod
Normal Killing 47m (x129 over 5h) kubelet, k8s-node-2 Killing container with id docker://jnlp:Need to kill Pod
Normal Killing 38m (x133 over 5h) kubelet, k8s-node-2 Killing container with id docker://dind:Need to kill Pod
Normal Killing 13m (x144 over 5h) kubelet, k8s-node-2 Killing container with id docker://maven:Need to kill Pod
Normal Killing 8m (x146 over 5h) kubelet, k8s-node-2 Killing container with id docker://docker-cmds:Need to kill Pod
Normal Killing 1m (x149 over 5h) kubelet, k8s-node-2 Killing container with id docker://pmd:Need to kill Pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 56m (x2644 over 4d) kubelet, k8s-node-0 Killing container with id docker://openssl:Need to kill Pod
Normal Killing 40m (x2651 over 4d) kubelet, k8s-node-0 Killing container with id docker://owasp-zap:Need to kill Pod
Normal Killing 31m (x2655 over 4d) kubelet, k8s-node-0 Killing container with id docker://pmd:Need to kill Pod
Normal Killing 26m (x2657 over 4d) kubelet, k8s-node-0 Killing container with id docker://kubectl:Need to kill Pod
Normal Killing 22m (x2659 over 4d) kubelet, k8s-node-0 Killing container with id docker://dind:Need to kill Pod
Normal Killing 11m (x2664 over 4d) kubelet, k8s-node-0 Killing container with id docker://docker-cmds:Need to kill Pod
Normal Killing 6m (x2666 over 4d) kubelet, k8s-node-0 Killing container with id docker://maven:Need to kill Pod
Normal Killing 1m (x2668 over 4d) kubelet, k8s-node-0 Killing container with id docker://jnlp:Need to kill Pod
答案 0 :(得分:1)
这通常与对象(吊舱,部署等)上的metadata.finalizers相关
您还可以详细了解Foreground Cascading Deleting及其如何使用metas.finalizers。
如果不是网络问题,则可以检查kubelet日志,通常:
journalctl -xeu kubelet
您还可以检查docker守护进程日志,通常:
cat /var/log/syslog | grep dockerd
答案 1 :(得分:0)
我必须重新启动所有节点。我注意到一个小兵慢而无反应,很可能是罪魁祸首。重新启动后,所有终止Pod都消失了。
答案 2 :(得分:0)
通过运行kubectl patch
,删除终结器是一种解决方法。这可能发生在不同类型的资源上,例如持久卷或部署。根据我的经验,PV / PVC更常见。
# for pods
$ kubectl patch pod pod-name-123abc -p '{"metadata":{"finalizers":null}}' -n your-app-namespace
# for pvc
$ kubectl patch pvc pvc-name-123abc -p '{"metadata":{"finalizers":null}}' -n your-app-namespace