我知道这里有一些现有的问题,它们通常是指https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-nodes-with-low-utilization-but-they-are-not-scaled-down-why
但是我仍然无法调试。我的集群上只运行了1个Pod,所以我不明白为什么它无法扩展到1个节点。我该如何进一步调试呢?
这里有一些信息:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-qua-gke-foobar1234-default-pool-6302174e-4k84 Ready <none> 4h14m v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-6wfs Ready <none> 16d v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-74lm Ready <none> 4h13m v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-m223 Ready <none> 4h13m v1.14.10-gke.27
gke-qua-gke-foobar1234-default-pool-6302174e-srlg Ready <none> 66d v1.14.10-gke.27
kubectl get pods
NAME READY STATUS RESTARTS AGE
qua-gke-foobar1234-5959446675-njzh4 1/1 Running 0 14m
nodePools:
- autoscaling:
enabled: true
maxNodeCount: 10
minNodeCount: 1
config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-highcpu-32
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/datastore
- https://www.googleapis.com/auth/devstorage.full_control
- https://www.googleapis.com/auth/pubsub
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/fooooobbbarrr-dev/zones/us-central1-a/instanceGroupManagers/gke-qua-gke-foobar1234-default-pool-6302174e-grp
locations:
- us-central1-a
management:
autoRepair: true
autoUpgrade: true
name: default-pool
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/ffoooobarrrr-dev/locations/us-central1/clusters/qua-gke-foobar1234/nodePools/default-pool
status: RUNNING
version: 1.14.10-gke.27
kubectl describe horizontalpodautoscaler
Name: qua-gke-foobar1234
Namespace: default
Labels: <none>
Annotations: autoscaling.alpha.kubernetes.io/conditions:
[{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-03-17T19:59:19Z","reason":"ReadyForNewScale","message":"recommended size...
autoscaling.alpha.kubernetes.io/current-metrics:
[{"type":"External","external":{"metricName":"pubsub.googleapis.com|subscription|num_undelivered_messages","metricSelector":{"matchLabels"...
autoscaling.alpha.kubernetes.io/metrics:
[{"type":"External","external":{"metricName":"pubsub.googleapis.com|subscription|num_undelivered_messages","metricSelector":{"matchLabels"...
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"qua-gke-foobar1234","namespace":...
CreationTimestamp: Tue, 17 Mar 2020 12:59:03 -0700
Reference: Deployment/qua-gke-foobar1234
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Events: <none>
答案 0 :(得分:2)
HorizontalPodAutoscaler
将增加或减少 pods 而不是节点的数量。它与节点缩放无关。
节点扩展由云提供商(在您的情况下)由Google Cloud Platform处理。
您应该从GCP控制台检查是否启用了节点自动缩放器。
您应遵循以下步骤: 1.转到GCP控制台上的Kubernetes clusters screen 2.单击您的集群 3.在底部,单击要为其启用自动缩放的节点池 4.点击“编辑” 5.启用自动缩放,定义最小和最大节点数,然后保存。查看屏幕截图:
或者,通过gcloud
CLI,如here所述:
gcloud container clusters update cluster-name --enable-autoscaling \
--min-nodes 1 --max-nodes 10 --zone compute-zone --node-pool default-pool
答案 1 :(得分:1)
因此,调试尝试的最初问题是我运行了kubectl get pods
而不是kubectl get pods --all-namespaces
,因此我看不到系统上正在运行的Pod。然后,在所有系统Pod上添加PDB。
kubectl create poddisruptionbudget pdb-event --namespace=kube-system --selector k8s-app=event-exporter --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-fluentd-scaler --namespace=kube-system --selector k8s-app=fluentd-gcp-scaler --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-heapster --namespace=kube-system --selector k8s-app=heapster --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-dns --namespace=kube-system --selector k8s-app=kube-dns --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-dnsauto --namespace=kube-system --selector k8s-app=kube-dns-autoscaler --max-unavailable 1 &&
kubectl create poddisruptionbudget pdb-glbc --namespace=kube-system --selector k8s-app=glbc --max-unavailable 1
然后我开始在某些pdb事件日志中遇到这些错误。 controllermanager Failed to calculate the number of expected pods: found no controllers for pod
,我在运行kubectl describe pdb --all-namespaces
时在pdb中看到了这些。我不知道为什么会这样,但是我删除了这些pdb。然后一切开始起作用!