Question

我们已经创建了一个GKE集群，并将其设置为区域A和B中的europe-west2。该集群设置为：

节点数：1（总共2个）自动缩放：是（每个区域1-4个节点）

我们正在尝试测试自动伸缩，并且群集无法调度任何Pod，并且不添加任何其他节点。

W 2019-11-11T14:03:17Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
W 2019-11-11T14:03:20Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:51Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:53Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached

大约有80％的豆荚处于无法计划的状态，并且显示为处于错误状态。但是我们从来没有看到群集的大小增加（不是物理的也不是水平的）。

我们从2节点设置开始，并进行了负载测试以使其达到最大。两个节点上的CPU达到100％，两个节点上的RAM达到95％。我们收到此错误消息：

I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z Ensuring load balancer 
W 2019-11-11T16:01:24Z Error creating load balancer (will retry): failed to ensure load balancer for service istio-system/istio-ingressgateway: failed to ensure a static IP for load balancer (a72c616b7f5cf11e9b4694201ac10480(istio-system/istio-ingressgateway)): error getting static IP address: googleapi: Error 404: The resource 'projects/gc-lotto-stage/regions/europe-west2/addresses/a72c616b7f5cf11e9b4694201ac10480' was not found, notFound 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:26Z missing request for cpu 
I 2019-11-11T16:01:31Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T16:01:35Z missing request for cpu 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu.

Answer 1

这也取决于配置的节点大小：

首先查看节点可分配资源：

Kubectl describe node <node>
Allocatable:
  cpu:                4
  ephemeral-storage:  17784772Ki
  hugepages-2Mi:      0
  memory:             4034816Ki
  pods:               110

还要检查已分配的资源：

Allocated resources:
  Kubectl describe node <node>
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1505m (37%)   3 (75%)
  memory             2750Mi (69%)  6484Mi (164%)
  ephemeral-storage  0 (0%)        0 (0%)

然后查看资源请求：

如果CPU请求/内存请求多于节点可分配的资源，则该节点可能无法自动伸缩。节点具有足够的能力来处理pod请求。

理想情况下，可分配资源小于实际容量，因为系统会将部分容量分配给系统守护程序。

Answer 2

一段时间以来，我遇到了同样的问题，经过大量研究和跟踪发现，如果要在GKE中实现群集自动扩展，则必须牢记一些事情。

设置资源请求并限制每个可能的工作负载
自动缩放可按要求工作，而不受限制。因此，如果您的工作负载所有请求的总和仅超过节点池中可用的总资源，那么您将看到它正在扩展。

这帮了我大忙。

希望有帮助。

节点池群集未自动扩展

2 个答案: