Question

我在集群（10个节点）中运行了几个kubernetes pod。每个吊舱仅包含一个承载一个工作流程的容器。我已经为容器指定了CPU的“限制”和“请求”。以下是对在节点（crypt12）上运行的一个pod的描述。

const pos = `Plugin Output:
Active Services :
Font Service [ FontService ] 
Windows Driver Foundation - User-mode Driver Framework [ wudfsvc ] 

Inactive Services :

Adobe Flash Player Update Service [ AdobeFlashPlayerUpdateSvc ] 
Foo Client [ Foo Client ] 
App Readiness [ AppReadiness ]`;

const neg = `Plugin Output:
Active Services :
Font Service [ FontService ] 
Foo Client [ Foo Client ] 
Windows Driver Foundation - User-mode Driver Framework [ wudfsvc ] 

Inactive Services :

Adobe Flash Player Update Service [ AdobeFlashPlayerUpdateSvc ] 
App Readiness [ AppReadiness ]`;

console.log(/Inactive Services :[\w\W]*Foo Client/.test(pos));
console.log(/Inactive Services :[\w\W]*Foo Client/.test(neg));

以下是我运行“ kubectl可描述节点crypt12”的输出

Name:           alexnet-worker-6-9954df99c-p7tx5
Namespace:      default
Node:           crypt12/172.16.28.136
Start Time:     Sun, 15 Jul 2018 22:26:57 -0400
Labels:         job=worker
                name=alexnet
                pod-template-hash=551089557
                task=6
Annotations:    <none>
Status:         Running
IP:             10.38.0.1
Controlled By:  ReplicaSet/alexnet-worker-6-9954df99c
Containers:
  alexnet-v1-container:
    Container ID:  docker://214e30e87ed4a7240e13e764200a260a883ea4550a1b5d09d29ed827e7b57074
    Image:         alexnet-tf150-py3:v1
    Image ID:      docker://sha256:4f18b4c45a07d639643d7aa61b06bfee1235637a50df30661466688ab2fd4e6d
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      /usr/bin/python3
      cifar10_distributed.py
    Args:
      --data_dir=xxxx

    State:          Running
      Started:      Sun, 15 Jul 2018 22:26:59 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     800m
      memory:  6G
    Requests:
      cpu:        800m
      memory:     6G
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hfnlp (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-hfnlp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hfnlp
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  kubernetes.io/hostname=crypt12
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

如图所示，在节点描述（“未终止的豆荚”部分）中，CPU限制为10％。但是，当我在node（crypt12）上运行“ ps”或“ top”命令时，工作进程的CPU利用率超过10％（约20％）。为什么会这样？有人可以阐明这一点吗？

已更新：我在github问题讨论中找到了问题的答案：来自“ kubectl describe节点”的cpu百分比为“ CPU限制/核心数”。由于我将CPU限制设置为0.8，因此10％是0.8 / 8的结果。

Answer 1

我在github问题讨论中找到了问题的答案：来自“ kubectl describe node”的cpu百分比为“ CPU-limits / Cores＃”。由于我将CPU限制设置为0.8，所以10％是0.8 / 8的结果。
这是链接：https://github.com/kubernetes/kubernetes/issues/24925

Answer 2

首先，默认情况下，“顶部”显示每个内核的利用率百分比。因此，使用8个核心，您可以拥有800％的利用率。

如果您正在正确地阅读最重要的统计信息，则可能与您的节点运行的进程比Pod还要多有关。想想kube-proxy，kubelet和任何其他控制器。 GKE还运行仪表板并调用api进行统计。

还请注意，资源是每100毫秒计算一次。容器的利用率可以飙升至10％以上，但平均而言，在此期间内使用的容器绝不会超过允许的范围。

在official documentation中，其内容为：

spec.containers []。resources.limits.cpu转换为其毫核心值并乘以100。结果值是容器每100ms可使用的CPU时间总量。在此时间间隔内，容器使用的CPU时间不能超过其份额。

为什么实际的CPU利用率百分比超过Kubernetes中的Pod CPU限制

2 个答案: