我有一个集群(由牧场主RKE部署),具有3个主控(HA)和8个如下所示的工作人员
worker7 Ready worker 199d v1.15.5 10.116.18.42 <none> Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-1062.el7.x86_64 docker://19.3.4
它使用ingress-nginx(图像标签0.25)作为入口控制器,使用cannel作为网络插件。群集运行良好,请参见下面的top
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master1 219m 5% 4497Mi 78%
master2 299m 7% 4053Mi 71%
master3 266m 6% 4255Mi 72%
worker1 778m 4% 27079Mi 42%
worker2 691m 4% 43636Mi 67%
worker3 528m 3% 48660Mi 75%
worker4 677m 4% 37532Mi 58%
worker5 895m 5% 51634Mi 80%
worker6 838m 5% 47337Mi 73%
worker7 2388m 14% 47065Mi 73%
worker8 1805m 11% 40601Mi 63%
下面的worker1上的豆荚
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
cattle-prometheus exporter-node-cluster-monitoring-jqqkv 100m (0%) 200m (1%) 30Mi (0%) 200Mi (0%) 197d
cattle-prometheus prometheus-cluster-monitoring-1 1350m (8%) 1800m (11%) 5200Mi (8%) 5350Mi (8%) 4d23h
cattle-system cattle-node-agent-ml7fl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 173d
ingress-nginx nginx-ingress-controller-hdbjp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 92d
kube-system canal-bpqjl 250m (1%) 0 (0%) 0 (0%) 0 (0%) 165d
sigma-demo apollo-configservice-dev-64f54f4b58-8tdm8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d23h
sigma-demo ibor-8d9c9d54d-8bmh9 700m (4%) 1 (6%) 1Gi (1%) 4Gi (6%) 2d16h
sigma-sit ibor-admin-7f886488cb-k4t5p 100m (0%) 1500m (9%) 1Gi (1%) 4Gi (6%) 2d19h
sigma-sit ibor-collect-5698947546-69zz9 200m (1%) 1 (6%) 1Gi (1%) 2Gi (3%) 2d16h
utils filebeat-filebeat-59hx7 100m (0%) 1 (6%) 100Mi (0%) 200Mi (0%) 6d13h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2800m (17%) 6500m (40%)
memory 8402Mi (13%) 15990Mi (24%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
如您所见,它们并没有那么多高资源请求(ibor
是一个用于加载数据的Java程序(需要使用高CPU和高内存,尽管需要进行优化),而apollo是一个配置中心)
但是当我登录worker1节点并使用htop
命令时,我们可以获得系统负载较高且已经填满所有CPU数量的报告
但是我不知道哪个进程使系统负载如此之高。它将增长到大约30〜40,最终破坏系统。一切都没什么不同,只是高cs
和us
。
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 0 15227600 3176 40686872 0 0 0 26 1 2 4 1 95 0 0
0 0 0 15227772 3176 40686952 0 0 0 34 16913 14861 2 2 96 0 0
1 0 0 15226836 3176 40686976 0 0 0 33 18861 13368 2 2 96 0 0
0 0 0 15226736 3176 40686984 0 0 0 630 15778 14887 2 1 97 0 0
0 0 0 15226716 3176 40687196 0 0 0 31 17228 14023 4 2 95 0 0
0 0 0 15225188 3176 40687224 0 0 0 0 20546 17126 3 2 95 0 0
0 0 0 15224868 3176 40687240 0 0 0 32 16025 14326 2 1 97 0 0
2 0 0 15224128 3176 40687544 0 0 0 34 20494 16183 3 2 95 0 0
0 0 0 15224324 3176 40687548 0 0 0 33 15158 12917 3 1 95 0 0
0 0 0 15225152 3176 40687572 0 0 0 0 19292 15307 2 2 96 0 0
2 0 0 15224764 3176 40687576 0 0 0 33 15634 13430 3 1 95 0 0
1 0 0 15220824 3176 40687768 0 0 0 0 21238 15215 11 2 86 0 0
2 0 0 15221352 3176 40687776 0 0 0 33 14481 12017 3 1 95 0 0
2 0 0 15220140 3176 40687796 0 0 0 33 20263 16450 4 3 93 0 0
1 0 0 15220200 3176 40688108 0 0 0 0 16103 12503 2 1 97 0 0
1 0 0 15220692 3176 40688116 0 0 0 64 20478 15081 2 2 95 0 0
因此,我想寻求帮助,以了解导致此情况的过程以及如何检查?