Question

我知道这里有一些现有的问题，它们通常是指https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-nodes-with-low-utilization-but-they-are-not-scaled-down-why

但是我仍然无法调试。我的集群上只运行了1个Pod，所以我不明白为什么它无法扩展到1个节点。我该如何进一步调试呢？

这里有一些信息：

kubectl get nodes
NAME                                                STATUS   ROLES    AGE     VERSION
gke-qua-gke-foobar1234-default-pool-6302174e-4k84   Ready    <none>   4h14m   v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-6wfs   Ready    <none>   16d     v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-74lm   Ready    <none>   4h13m   v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-m223   Ready    <none>   4h13m   v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-srlg   Ready    <none>   66d     v1.14.10-gke.27

kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
qua-gke-foobar1234-5959446675-njzh4   1/1     Running   0          14m

nodePools:
- autoscaling:
    enabled: true
    maxNodeCount: 10
    minNodeCount: 1
  config:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS
    machineType: n1-highcpu-32
    metadata:
      disable-legacy-endpoints: 'true'
    oauthScopes:
    - https://www.googleapis.com/auth/datastore
    - https://www.googleapis.com/auth/devstorage.full_control
    - https://www.googleapis.com/auth/pubsub
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    serviceAccount: default
    shieldedInstanceConfig:
      enableIntegrityMonitoring: true
  initialNodeCount: 1
  instanceGroupUrls:
  - https://www.googleapis.com/compute/v1/projects/fooooobbbarrr-dev/zones/us-central1-a/instanceGroupManagers/gke-qua-gke-foobar1234-default-pool-6302174e-grp
  locations:
  - us-central1-a
  management:
    autoRepair: true
    autoUpgrade: true
  name: default-pool
  podIpv4CidrSize: 24
  selfLink: https://container.googleapis.com/v1/projects/ffoooobarrrr-dev/locations/us-central1/clusters/qua-gke-foobar1234/nodePools/default-pool
  status: RUNNING
  version: 1.14.10-gke.27

kubectl describe horizontalpodautoscaler
Name:               qua-gke-foobar1234
Namespace:          default
Labels:             <none>
Annotations:        autoscaling.alpha.kubernetes.io/conditions:
                      [{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-03-17T19:59:19Z","reason":"ReadyForNewScale","message":"recommended size...
                    autoscaling.alpha.kubernetes.io/current-metrics:
                      [{"type":"External","external":{"metricName":"pubsub.googleapis.com|subscription|num_undelivered_messages","metricSelector":{"matchLabels"...
                    autoscaling.alpha.kubernetes.io/metrics:
                      [{"type":"External","external":{"metricName":"pubsub.googleapis.com|subscription|num_undelivered_messages","metricSelector":{"matchLabels"...
                    kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"qua-gke-foobar1234","namespace":...
CreationTimestamp:  Tue, 17 Mar 2020 12:59:03 -0700
Reference:          Deployment/qua-gke-foobar1234
Min replicas:       1
Max replicas:       10
Deployment pods:    1 current / 1 desired
Events:             <none>

Answer 1

HorizontalPodAutoscaler将增加或减少 pods 而不是节点的数量。它与节点缩放无关。

节点扩展由云提供商（在您的情况下）由Google Cloud Platform处理。

您应该从GCP控制台检查是否启用了节点自动缩放器。

您应遵循以下步骤： 1.转到GCP控制台上的Kubernetes clusters screen 2.单击您的集群 3.在底部，单击要为其启用自动缩放的节点池 4.点击“编辑” 5.启用自动缩放，定义最小和最大节点数，然后保存。查看屏幕截图：

或者，通过gcloud CLI，如here所述：

gcloud container clusters update cluster-name --enable-autoscaling \
    --min-nodes 1 --max-nodes 10 --zone compute-zone --node-pool default-pool

Answer 2

因此，调试尝试的最初问题是我运行了kubectl get pods而不是kubectl get pods --all-namespaces，因此我看不到系统上正在运行的Pod。然后，在所有系统Pod上添加PDB。

kubectl create poddisruptionbudget pdb-event --namespace=kube-system --selector k8s-app=event-exporter --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-fluentd-scaler --namespace=kube-system --selector k8s-app=fluentd-gcp-scaler --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-heapster --namespace=kube-system --selector k8s-app=heapster --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-dns --namespace=kube-system --selector k8s-app=kube-dns --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-dnsauto --namespace=kube-system --selector k8s-app=kube-dns-autoscaler --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-glbc --namespace=kube-system --selector k8s-app=glbc --max-unavailable 1

然后我开始在某些pdb事件日志中遇到这些错误。 controllermanager Failed to calculate the number of expected pods: found no controllers for pod，我在运行kubectl describe pdb --all-namespaces时在pdb中看到了这些。我不知道为什么会这样，但是我删除了这些pdb。然后一切开始起作用！

即使我只有一个Pod，为什么GKE也不按比例缩小群集节点？

2 个答案: