我们的集群中有集群Autoscalar,预留空间,调度程序设置。 预留空间过多设置为副本数44。(11个容器可提供1个EC2实例的缓冲区) 我们看到了一些问题,例如节点在10分钟内缩小,并在20分钟内添加了新节点。有时会重复。当集群自动缩放器发现该节点没有使用10分钟时,它会尝试缩小。在接下来的20分钟内,需要创建一个新节点。 不确定如何调整...这是由于每30分钟运行一次调度程序cronjob导致的重新平衡吗?还是这是因为重复配置44过多?还是我们应该考虑的其他事项?
k logs -n kube-system cluster-autoscaler-aws-cluster-autoscaler-774bbb4cf-9mq4z aws-cluster-autoscaler | ag "(scale-up plan)|(removing empty node)"
I0505 22:42:14.659857 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-36-56.ec2.internal
I0505 22:42:14.660428 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46611687", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-36-56.ec2.internal
I0505 23:00:18.881836 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0505 23:13:01.741943 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-42-5.ec2.internal
I0505 23:13:01.742378 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46624852", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-42-5.ec2.internal
I0505 23:30:05.422452 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0505 23:42:08.867123 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-36-45.ec2.internal
I0505 23:42:08.868311 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46637400", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-36-45.ec2.internal
I0506 00:00:13.286783 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0506 00:12:26.796684 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-59-39.ec2.internal
I0506 00:12:26.796974 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46650479", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-59-39.ec2.internal
I0506 00:30:22.431082 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]