为什么我的 k8s 集群报告节点“过度使用”?

时间:2021-03-04 22:18:34

标签: kubernetes amazon-eks

我将我的应用程序部署到具有 3 个节点的 AWS EKS 集群。当我运行describe 时,它​​向我显示以下消息:(Total limits may be over 100 percent, i.e., overcommitted.)。但是根据完整的消息,似乎没有很多资源。为什么我会在输出中看到这条消息?

$ kubectl describe node ip-192-168-54-184.ap-southeast-2.compute.internal
Name:               ip-192-168-54-184.ap-southeast-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.medium
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=scalegroup
                    eks.amazonaws.com/nodegroup-image=ami-0ecaff41b4f38a650
                    failure-domain.beta.kubernetes.io/region=ap-southeast-2
                    failure-domain.beta.kubernetes.io/zone=ap-southeast-2b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-192-168-54-184.ap-southeast-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=t3.medium
                    topology.kubernetes.io/region=ap-southeast-2
                    topology.kubernetes.io/zone=ap-southeast-2b
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 04 Mar 2021 22:27:50 +1100
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-192-168-54-184.ap-southeast-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Fri, 05 Mar 2021 09:13:16 +1100
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 05 Mar 2021 09:11:33 +1100   Thu, 04 Mar 2021 22:27:50 +1100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 05 Mar 2021 09:11:33 +1100   Thu, 04 Mar 2021 22:27:50 +1100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 05 Mar 2021 09:11:33 +1100   Thu, 04 Mar 2021 22:27:50 +1100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 05 Mar 2021 09:11:33 +1100   Thu, 04 Mar 2021 22:28:10 +1100   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.54.184
  ExternalIP:   13.211.200.109
  Hostname:     ip-192-168-54-184.ap-southeast-2.compute.internal
  InternalDNS:  ip-192-168-54-184.ap-southeast-2.compute.internal
  ExternalDNS:  ec2-13-211-200-109.ap-southeast-2.compute.amazonaws.com
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3970504Ki
  pods:                        17
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3415496Ki
  pods:                        17
System Info:
  Machine ID:                 ec246b12e91dc516024822fbcdac4408
  System UUID:                ec246b12-e91d-c516-0248-22fbcdac4408
  Boot ID:                    5c6a3d95-c82c-4051-bc90-6e732b0b5be2
  Kernel Version:             5.4.91-41.139.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.6
  Kubelet Version:            v1.19.6-eks-49a6c0
  Kube-Proxy Version:         v1.19.6-eks-49a6c0
ProviderID:                   aws:///ap-southeast-2b/i-03c0417efb85b8e6c
Non-terminated Pods:          (4 in total)
  Namespace                   Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                     ------------  ----------  ---------------  -------------  ---
  cert-manager                cert-manager-cainjector-9747d56-qwhjw    0 (0%)        0 (0%)      0 (0%)           0 (0%)         10h
  kube-system                 aws-node-m296t                           10m (0%)      0 (0%)      0 (0%)           0 (0%)         10h
  kube-system                 coredns-67997b9dbd-cgjdj                 100m (5%)     0 (0%)      70Mi (2%)        170Mi (5%)     10h
  kube-system                 kube-proxy-dc5fh                         100m (5%)     0 (0%)      0 (0%)           0 (0%)         10h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         210m (10%)  0 (0%)
  memory                      70Mi (2%)   170Mi (5%)
  ephemeral-storage           0 (0%)      0 (0%)
  hugepages-1Gi               0 (0%)      0 (0%)
  hugepages-2Mi               0 (0%)      0 (0%)
  attachable-volumes-aws-ebs  0           0
Events:                       <none>

1 个答案:

答案 0 :(得分:1)

让我们快速分析一下 kubectl describe 命令的 source code,特别是 describeNodeResource 函数。

describeNodeResource(...) 函数中我们看到(this line):

w.Write(LEVEL_0, "Allocated resources:\n  (Total limits may be over 100 percent, i.e., overcommitted.)\n")

没有条件检查何时应该打印此消息,它只是每次打印的信息性消息。