Question

Kubernetes version : v1.6.7
Network plugin : weave

I recently noticed that my entire cluster of 3 nodes went down. Doing my initial level of troubleshooting revealed that /var on all nodes was 100%.

Doing further into the logs revealed the logs to be flooded by kubelet stating

Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.636001    1220 kuberuntime_gc.go:138] Failed to stop sandbox "fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "<TROUBLING_POD>-1545236220-ds0v1_kube-system" network: CNI failed to retrieve network namespace path: Error: No such container: fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211
Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.637690    1220 docker_sandbox.go:205] Failed to stop sandbox "fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648": Error response from daemon: {"message":"No such container: fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648"}

The <TROUBLING_POD>-1545236220-ds0v1 was being initiated due to a cronjob and due to some misconfigurations, there were errors occurring during the running of those pods and more pods were being spun up.

So I deleted all the jobs and their related pods. So I had a cluster that had no jobs/pods running related to my cronjob and still see the same ERROR messages flooding the logs.

I did :

1) Restart docker and kubelet on all nodes.

2) Restart the entire control plane

and also 3) Reboot all nodes.

But still the logs are being flooded with the same error messages even though no such pods are even being spun up.

So I dont know how can I stop kubelet from throwing out the errors.

Is there a way for me to reset the network plugin I am using ? Or do something else ?

Answer 1

检查pod目录是否位于/var/lib/kubelet下

您使用的是Kubernetes的旧版本，升级将解决此问题。

kubelet logs flooding even after pods deleted

1 个答案: