Question

cadvisor具有两个指标 container_cpu_cfs_throttled_seconds_total 和 container_cpu_cfs_throttled_periods_total

我对这意味着什么感到困惑。

我发现大约有两个解释：

容器以cpu限制运行，当容器cpu超过限制时，容器将被“限制”，并添加时间到 container_cpu_cfs_throttled_seconds_total

that means ：
 (1). only container cpu over limit, rate(container_cpu_cfs_throttled_seconds_total) > 0. 
 (2). we can use this metrics to alert container cpu over limit ...

当主机处于高CPU压力时，它将“限制”具有POD QoS的容器（保证> Burstable>尽力而为）...

that means ：
 (1). container_cpu_cfs_throttled_seconds_total will add has no relate with how many cpu container used and cpu limit ..
 (2). this metrics can not to alert container cpu over limit ..

Answer 1

让我们说在machine1上运行的httpbin容器。可以说，httbin在其部署中设置了一个限制，以最多使用1个CPU。而且machine1有2个CPU。它使httpbin可以使用一半的可用空间。

如果httpbin容器尝试使用1个以上的CPU，则kubernetes不会杀死该容器。它将节流。如果这种情况经常发生，则您可能需要对此进行提醒并修复部署。另一种情况是，如果machine1中有多个容器，并且缺少CPU资源，那么它将限制它拥有的所有容器。

container_cpu_cfs_throttled_seconds_total是容器被限制的总持续时间，以秒为单位。 container_cpu_cfs_throttled_periods_total是节流时间间隔的数量

Answer 2

container_cpu_cfs_throttled_seconds_total 是所有节流持续时间的总和，即容器被节流的持续时间，即停止使用使用 CFS Cgroup bandwidth control。

由于每个停止的线程都会将其节流持续时间添加到 container_cpu_cfs_throttled_seconds_total，因此这个数字可能会变得很大并且对您没有帮助（除非您有已知的固定线程数）。

这就是为什么关于 CPU 节流的警报通常基于指标 throttled percentage := container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total，即容器运行但被节流的 CPU 周期的百分比（停止运行整个 CPU 周期).

更多详情，您可以观看this talk on CFS and CPU scheduling，或阅读the corresponding article。

什么是container_cpu_cfs_throttled_seconds_total指标

2 个答案: