我在AWS云上使用KOPS创建了一个kubernetes集群。集群创建没有任何问题,运行正常10-15小时。我已在此群集上部署了SAP Vora2.1。然而,通常在12-15小时之后,KOPS集群遇到与kube-proxy和kube-dns相关的问题。这些吊舱要么下降,要么显示处于完成状态。也有很多重启。这最终导致我的应用程序pod陷入问题,应用程序也出现故障。应用程序使用consul进行服务发现,但是由于kubernetes基础服务不能正常工作,因此即使我尝试恢复kube-proxy / kube-dns pod,应用程序也不会达到稳定状态。
这是在完全自动缩放模式下设置的3节点群集(1个主节点和2个节点)。覆盖网络使用默认的kubenet。以下是系统遇到问题后pod状态的快照,
[root@ip-172-31-18-162 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
infyvora vora-catalog-1549734119-cfnhz 0/2 CrashLoopBackOff 188 20h
infyvora vora-consul-0 0/1 CrashLoopBackOff 101 20h
infyvora vora-consul-1 1/1 Running 34 20h
infyvora vora-consul-2 0/1 CrashLoopBackOff 95 20h
infyvora vora-deployment-operator-293895365-4b3t6 0/1 Completed 104 20h
infyvora vora-disk-0 1/2 CrashLoopBackOff 187 20h
infyvora vora-dlog-0 0/2 CrashLoopBackOff 226 20h
infyvora vora-dlog-1 1/2 CrashLoopBackOff 155 20h
infyvora vora-doc-store-2451237348-dkrm6 0/2 CrashLoopBackOff 229 20h
infyvora vora-elasticsearch-logging-v1-444540252-mwfrz 0/1 CrashLoopBackOff 100 20h
infyvora vora-elasticsearch-logging-v1-444540252-vrr63 1/1 Running 14 20h
infyvora vora-elasticsearch-retention-policy-137762458-ns5pc 1/1 Running 13 20h
infyvora vora-fluentd-kubernetes-v1.21-9f4pt 1/1 Running 12 20h
infyvora vora-fluentd-kubernetes-v1.21-s2t1j 0/1 CrashLoopBackOff 99 20h
infyvora vora-grafana-2929546178-vrf5h 1/1 Running 13 20h
infyvora vora-graph-435594712-47lcg 0/2 CrashLoopBackOff 157 20h
infyvora vora-kibana-logging-3693794794-2qn86 0/1 CrashLoopBackOff 99 20h
infyvora vora-landscape-2532068267-w1f5n 0/2 CrashLoopBackOff 232 20h
infyvora vora-nats-streaming-1569990702-kcl1v 1/1 Running 13 20h
infyvora vora-prometheus-node-exporter-k4c3g 0/1 CrashLoopBackOff 102 20h
infyvora vora-prometheus-node-exporter-xp511 1/1 Running 13 20h
infyvora vora-prometheus-pushgateway-399610745-tcfk7 0/1 CrashLoopBackOff 103 20h
infyvora vora-prometheus-server-3955170982-xpct0 2/2 Running 24 20h
infyvora vora-relational-376953862-w39tc 0/2 CrashLoopBackOff 237 20h
infyvora vora-security-operator-2514524099-7ld0k 0/1 CrashLoopBackOff 103 20h
infyvora vora-thriftserver-409431919-8c1x9 2/2 Running 28 20h
infyvora vora-time-series-1188816986-f2fbq 1/2 CrashLoopBackOff 184 20h
infyvora vora-tools5tlpt-100252330-mrr9k 0/1 rpc error: code = 4 desc = context deadline exceeded 272 17h
infyvora vora-tools6zr3m-3592177467-n7sxd 0/1 Completed 1 20h
infyvora vora-tx-broker-4168728922-hf8jz 0/2 CrashLoopBackOff 151 20h
infyvora vora-tx-coordinator-3910571185-l0r4n 0/2 CrashLoopBackOff 184 20h
infyvora vora-tx-lock-manager-2734670982-bn7kk 0/2 Completed 228 20h
infyvora vsystem-1230763370-5ckr0 0/1 CrashLoopBackOff 115 20h
infyvora vsystem-auth-1068224543-0g59w 0/1 CrashLoopBackOff 102 20h
infyvora vsystem-vrep-1427606801-zprlr 0/1 CrashLoopBackOff 121 20h
kube-system dns-controller-3110272648-chwrs 1/1 Running 0 22h
kube-system etcd-server-events-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system etcd-server-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-apiserver-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-controller-manager-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-dns-1311260920-cm1fs 0/3 Completed 309 22h
kube-system kube-dns-1311260920-hm5zd 3/3 Running 39 22h
kube-system kube-dns-autoscaler-1818915203-wmztj 1/1 Running 12 22h
kube-system kube-proxy-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-proxy-ip-172-31-64-110.ap-southeast-1.compute.internal 0/1 CrashLoopBackOff 98 22h
kube-system kube-proxy-ip-172-31-64-15.ap-southeast-1.compute.internal 1/1 Running 13 22h
kube-system kube-scheduler-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system tiller-deploy-352283156-97hhb 1/1 Running 34 22h
有没有人在AWS上遇到与KOPS kubernetes相关的类似问题。感谢是否有任何指针可以解决这个问题。
此致 迪帕克