Kubernetes Cluster Autoscaler不会按比例缩小EKS上的实例-仅记录不需要该节点

时间:2019-09-22 17:12:12

标签: kubernetes autoscaling amazon-eks

这是自动定标器的日志:

0922 17:08:33.857348       1 auto_scaling_groups.go:102] Updating ASG terraform-eks-demo20190922161659090500000007--terraform-eks-demo20190922161700651000000008
I0922 17:08:33.857380       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-09-22 17:08:43.857375311 +0000 UTC m=+259.289807511
I0922 17:08:33.857465       1 utils.go:526] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0922 17:08:33.857482       1 static_autoscaler.go:261] Filtering out schedulables
I0922 17:08:33.857532       1 static_autoscaler.go:271] No schedulable pods
I0922 17:08:33.857545       1 static_autoscaler.go:279] No unschedulable pods
I0922 17:08:33.857557       1 static_autoscaler.go:333] Calculating unneeded nodes
I0922 17:08:33.857601       1 scale_down.go:376] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
I0922 17:08:33.857621       1 scale_down.go:407] Node ip-10-0-1-135.us-west-2.compute.internal - utilization 0.055000
I0922 17:08:33.857688       1 static_autoscaler.go:349] ip-10-0-1-135.us-west-2.compute.internal is unneeded since 2019-09-22 17:05:07.299351571 +0000 UTC m=+42.731783882 duration 3m26.405144434s
I0922 17:08:33.857703       1 static_autoscaler.go:360] Scale down status: unneededOnly=true lastScaleUpTime=2019-09-22 17:04:42.29864432 +0000 UTC m=+17.731076395 lastScaleDownDeleteTime=2019-09-22 17:04:42.298645611 +0000 UTC m=+17.731077680 lastScaleDownFailTime=2019-09-22 17:04:42.298646962 +0000 UTC m=+17.731079033 scaleDownForbidden=false isDeleteInProgress=false
I0922 17:08:33.857688       1 static_autoscaler.go:349] ip-10-0-1-135.us-west-2.compute.internal is unneeded since 2019-09-22 17:05:07.299351571 +0000 UTC m=+42.731783882 duration 3m26.405144434s

如果不需要,那么下一步是什么?还等什么呢?

我耗尽了一个节点:

kubectl get nodes -o=wide
NAME                                       STATUS                     ROLES    AGE   VERSION               INTERNAL-IP   EXTERNAL-IP      OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-0-118.us-west-2.compute.internal   Ready                      <none>   46m   v1.13.10-eks-d6460e   10.0.0.118    52.40.115.132    Amazon Linux 2   4.14.138-114.102.amzn2.x86_64   docker://18.6.1
ip-10-0-0-211.us-west-2.compute.internal   Ready                      <none>   44m   v1.13.10-eks-d6460e   10.0.0.211    35.166.57.203    Amazon Linux 2   4.14.138-114.102.amzn2.x86_64   docker://18.6.1
ip-10-0-1-135.us-west-2.compute.internal   Ready,SchedulingDisabled   <none>   46m   v1.13.10-eks-d6460e   10.0.1.135    18.237.253.134   Amazon Linux 2   4.14.138-114.102.amzn2.x86_64   docker://18.6.1

为什么不终止实例?

这些是我正在使用的参数:

        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=default
        - --scan-interval=25s
        - --scale-down-unneeded-time=30s
        - --nodes=1:20:terraform-eks-demo20190922161659090500000007--terraform-eks-demo20190922161700651000000008
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/example-job-runner
        - --logtostderr=true
        - --stderrthreshold=info
        - --v=4

1 个答案:

答案 0 :(得分:0)

您有以下任何一项吗?

  • 在该节点上运行的没有控制器对象的Pod(即部署/副本集?
  • 任何没有广告连播中断预算的kube系统广告连播
  • 具有本地存储或任何自定义相似性/反相似性/ nodeSelectors的豆荚
  • 该节点上的注释集,可防止cluster-autoscaler对其进行缩小

您对CA的配置/启动选项对我来说看起来不错。

我只能想象对于在该节点上运行的特定Pod来说,这可能是一件好事。也许在未按比例缩小的所列节点上运行的kube系统吊舱中运行,然后检查上面的列表。

这两个页​​面部分中有一些很好的项目需要检查,这可能导致CA无法按比例缩小节点。

low utilization nodes but not scaling down, why? what types of pods can prevent CA from removing a node?