Question

我不明白为什么会收到此错误。一个新节点肯定应该能够容纳该容器。因为我只请求 768Mi 的内存和 450m 的CPU，所以将自动缩放的实例组的类型为n1-highcpu-2- 2个vCPU ，1.8GB 。

我该如何进一步诊断？

kubectl描述广告连播：

Name:           initial-projectinitialabcrad-697b74b449-848bl
Namespace:      production
Node:           <none>
Labels:         app=initial-projectinitialabcrad
                appType=abcrad-api
                pod-template-hash=2536306005
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/initial-projectinitialabcrad-697b74b449
Containers:
  app:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     250m
      memory:  512Mi
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  nginx:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Readiness:    http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  cloudsql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.11
    Port:       3306/TCP
    Host Port:  0/TCP
    Command:
      /cloud_sql_proxy
      -instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Mounts:
      /secrets/cloudsql from cloudsql-instance-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  cloudsql-instance-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-instance-credentials
    Optional:    false
  default-token-srv8k:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-srv8k
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Normal   NotTriggerScaleUp  4m (x29706 over 3d)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  Warning  FailedScheduling   4m (x18965 over 3d)  default-scheduler   0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.

Answer 1

这不是硬件请求，而是由于定义了我的pod相似性规则

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: appType
        operator: NotIn
        values:
        - example-api
    topologyKey: kubernetes.io/hostname

Answer 2

如果您正在使用来自GKE / EKS之类的云提供商的K8，也许值得一看资源配额！

即使一切看起来都合理，K8也会出现相同的错误“ pod没有触发放大”！那是因为CPU配额用完了！（K8s与该限制无关，因此从K8s角度来看该错误尚不清楚）。

Answer 3

在 AWS EKS 上运行时，当为 Pod 配置的 Auto Scaling 组（具有污点和容忍度）达到最大容量时，我收到了相同的消息。扩大最大容量会导致创建新节点并开始运行 Pod。

Kubernetes报告“即使没有，pod也不会触发放大（如果添加新节点，pod将不适合）”？

3 个答案: