我有一个kops集群,最多包含75个节点,并添加了cluster autoscaler。它使用kubenet网络。 目前一切都停止了,也就是说,缩小不再发生了。
集群正在最大容量上运行,即75个节点,即使几乎没有负载。不确定从哪里开始解决问题。
在群集自动缩放器窗格中查看以下错误
I0222 01:45:14.327164 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:14.770818 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
I0222 01:45:15.043126 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:17.121507 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:19.126665 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:21.327581 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:23.331802 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:24.775124 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:25.085442 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
自动缩放工作正常。
更新,在运行kops validate cluster
VALIDATION ERRORS
KIND NAME MESSAGE
Node ip-172-20-32-173.ec2.internal node "ip-172-20-32-173.ec2.internal" is not ready
...
I0221 22:16:02.688911 2403 node_conditions.go:60] node "ip-172-20-51-238.ec2.internal" not ready: &NodeCondition{Type:NetworkUnavailable,Status:True,LastHeartbeatTime:2019-02-21 22:15:56 -0500 EST,LastTransitionTime:2019-02-21 22:15:56 -0500 EST,Reason:NoRouteCreated,Message:RouteController failed to create a route,}
答案 0 :(得分:1)
I found out the problem was that my Cluster had gone into an Unhealthy state because of this limitation in AWS VPC routing tables.My cluster had scaled to 75 nodes and then had become unhealthy and was not able to scale down.
From the link,
One important limitation when using kubenet networking is that an AWS routing table cannot have more than 50 entries, which sets a limit of 50 nodes per cluster.