这件事本周已经发生过两次,在广告连播说明中,我明白了
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NetworkNotReady 2m (x3 over 2m) kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Normal SandboxChanged 46s kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Pod sandbox changed, it will be killed and re-created.
我想解释一下正在发生的事情, 一切正常,突然之间我添加了节点描述
Type Reason Age From Message
---- ------ ---- ---- -------
Warning OOMKilling 44m kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Memory cgroup out of memory: Kill process 1560920 (runc:[2:INIT]) score 0 or sacrifice child
Killed process 1560920 (runc:[2:INIT]) total-vm:131144kB, anon-rss:2856kB, file-rss:5564kB, shmem-rss:0kB
Warning TaskHung 31m kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr INFO: task dockerd:1883293 blocked for more than 300 seconds.
Normal NodeAllocatableEnforced 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 30m (x2 over 30m) kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 30m (x2 over 30m) kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 30m (x2 over 30m) kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientPID
Warning Rebooted 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr has been rebooted, boot id: ecd3db95-4bfc-4df5-85b3-70df05f6fb48
Normal Starting 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Starting kubelet.
Normal NodeNotReady 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeNotReady
Normal NodeReady 30m kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeReady
Normal Starting 29m kube-proxy, gke-iagree-cluster-1-main-pool-5632d628-wgzr Starting kube-proxy.
Normal FrequentKubeletRestart 25m systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node condition FrequentKubeletRestart is now: False, reason: FrequentKubeletRestart
Normal CorruptDockerOverlay2 25m docker-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node condition CorruptDockerOverlay2 is now: False, reason: CorruptDockerOverlay2
Normal UnregisterNetDevice 25m kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node condition FrequentUnregisterNetDevice is now: False, reason: UnregisterNetDevice
Normal FrequentDockerRestart 25m systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node condition FrequentDockerRestart is now: False, reason: FrequentDockerRestart
Normal FrequentContainerdRestart 25m systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr Node condition FrequentContainerdRestart is now: False, reason: FrequentContainerdRestart
答案 0 :(得分:1)
看到错误后,看来您的CNI中的IP用尽了。在设置用于网络的kubenet CNI时,您必须已通过CIDR范围,该范围决定了Pod集群中可分配IP的数量。
我不确定kubenet,如果使用自己的虚拟网络,它如何将IP映射到Pod,您需要使用更大的CIDR范围,如果要从主机网络接口获取IP,则需要选择具有以下模式的机器:子网接口(这是AWS VPC CNI的工作方式)。
答案 1 :(得分:1)
由于以下问题,这些错误可能在GKE上的 1.11.x 中出现:gke-issue。
可以通过将GKE群集和节点升级到版本 1.12.5-gke.5 或 1.12.7-gke.10 来解决此问题。