我的Kubernetes集群(GKE)仅有三个节点,并且经常有一个节点过载CPU。我大约有32个部署,大多数都有3个Pod。当一个节点过载时,我通常会看到3个节点中的1个显示CrashLoop。理想情况下,事情不会崩溃,并且我的所有节点的利用率都不会超过100%。
为解决此问题,我删除了吊舱,排干并排空节点,或拉出节点,通常情况恢复正常。但是,我不知道其他人如何解决这个问题:
kubectl top nodes
,kubectl top pods
和kubectl get pods -o wide
来了解正在发生的事情。典型节点偏斜:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-staging-cluster-default-pool-386bd62d-bj36 481m 24% 4120Mi 38%
gke-staging-cluster-default-pool-386bd62d-cl3p 716m 37% 6583Mi 62%
gke-staging-cluster-default-pool-386bd62d-gms8 1999m 103% 6679Mi 63%
Pod资源:
kubectl top pod | sort -nr -k2
hchchc-staging-deployment-669ff7477c-lcx5d 248m 47Mi
ggg-hc-demo-staging-deployment-77f68db7f8-nf9b5 248m 125Mi
ggg-hc-demo-staging-deployment-77f68db7f8-c6jxd 247m 96Mi
ggg-hc-demo-staging-deployment-77f68db7f8-l44vj 244m 196Mi
athatha-staging-deployment-6dbdf7fb5d-h92h7 244m 95Mi
athatha-staging-deployment-6dbdf7fb5d-hqpm9 243m 222Mi
engine-cron-staging-deployment-77cfbfb948-9s9rv 142m 35Mi
hchchc-twitter-staging-deployment-7846f845c6-g8wt4 59m 83Mi
hchchc-worker-staging-deployment-7cbf995ddd-msrbt 51m 114Mi
hchchc-twitter-staging-deployment-7846f845c6-brlbl 51m 94Mi
关联Pod和节点:
kubectl get pods -o wide | grep Crash
hchchc-twitter-staging-deployment-7846f845c6-v8mgh 1/2 CrashLoopBackOff 17 1h 10.0.199.31 gke-staging-cluster-default-pool-386bd62d-gms8
hchchc-worker-staging-deployment-66d7b5d7f4-thxn6 1/2 CrashLoopBackOff 17 1h 10.0.199.31 gke-staging-cluster-default-pool-386bd62d-gms8
ggggg-worker-staging-deployment-76b84969d-hqqhb 1/2 CrashLoopBackOff 17 1h 10.0.199.31 gke-staging-cluster-default-pool-386bd62d-gms8
ggggg-worker-staging-deployment-76b84969d-t4xmb 1/2 CrashLoopBackOff 17 1h 10.0.199.31 gke-staging-cluster-default-pool-386bd62d-gms8
ggggg-worker-staging-deployment-76b84969d-zpkkf 1/2 CrashLoopBackOff 17 1h 10.0.199.31 gke-staging-cluster-default-pool-386bd62d-gms8
答案 0 :(得分:1)
您可能需要在您的部署中添加Pod反关联性。这样会将负载更均匀地分散到所有节点上。
反亲和力的例子:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
topologyKey: kubernetes.io/hostname
这告诉部署避免在节点上已经存在同一Pod的情况下将其放置在该节点上。因此,如果您的部署中有3个副本,则所有3个副本都应分布在3个节点上,而不是全部集中在单个节点上并消耗CPU。
这不是一个完美的解决方案,但可以帮助平衡一点负载。
在此处查看有关反亲和力的更多信息:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/