请帮忙处理一个奇怪的问题: 拥有一台带有2个Intel(R)Xeon(R)CPU E5-2630 v2的KVM主机(总共24个虚拟核心)。这个主机带有3个典型的ubuntu客户 - 每个8核,20Gb内存。在这样的配置中,一切似乎都没问题。当尝试使用相同的配置部署另一个guest虚拟机时,会发生奇怪的事情 - 即使在其他3位客人没有负载的情况下,当在4rth上给予一些合理的负载时,kvm主机上的%sy cpu使用率也会达到25-30%,顶部通常是这样的:
top - 14:29:39 up 104 days, 2:51, 6 users, load average: 6.46, 6.33, 4.81
Tasks: 227 total, 1 running, 226 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.0%us, 25.2%sy, 0.0%ni, 69.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 98975536k total, 48515312k used, 50460224k free, 154456k buffers
Swap: 100628476k total, 2176k used, 100626300k free, 1072440k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27523 libvirt- 20 0 21.1g 10g 6880 S 700 11.5 126:27.51 kvm
11745 libvirt- 20 0 21.1g 20g 6964 S 21 21.4 137891:19 kvm
32692 root 20 0 865m 8792 4532 S 1 0.0 28:49.51 libvirtd
23252 libvirt- 20 0 10.7g 1.0g 6840 S 1 1.1 6:54.43 kvm
117 root 25 5 0 0 0 S 0 0.0 1245:09 ksmd
1481 root 20 0 63784 12m 3880 S 0 0.0 54:34.15 gunicorn
22413 root 20 0 17464 1540 1092 S 0 0.0 4:21.68 top
22880 root 20 0 17452 1396 972 S 0 0.0 3:50.49 top
22885 root 20 0 73444 3564 2772 S 0 0.0 2:54.02 sshd
26008 root 20 0 17460 1528 1088 S 0 0.0 0:07.31 top
26530 root 20 0 17472 1412 972 S 0 0.0 0:05.43 top
1 root 20 0 24448 2324 1344 S 0 0.0 0:04.69 init
(27523是有问题的客人,另一个kvm流程是无负载的客人)
此刻的访客变得不可操作,LA开始增长到50-80甚至更高,几乎所有的cpu使用都分布在%us和%sy之间的不同proprotions
top - 14:38:21 up 37 min, 2 users, load average: 53.72, 59.50, 45.16
Tasks: 313 total, 9 running, 301 sleeping, 0 stopped, 3 zombie
Cpu(s): 67.5%us, 31.9%sy, 0.0%ni, 0.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 20590644k total, 11358672k used, 9231972k free, 59020k buffers
Swap: 10483708k total, 0k used, 10483708k free, 1821100k cached
在某个时刻开始有例外:
2014 Sep 17 14:35:09 dev2 [ 2037.438362] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.438370] Call Trace:
2014 Sep 17 14:35:09 dev2 [ 2037.438429] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:09 dev2 [ 2037.441963] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.443586] Call Trace:
2014 Sep 17 14:35:19 dev2 [ 2037.443586] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:45 dev2 [ 2073.284329] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285151] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285159] Call Trace:
2014 Sep 17 14:35:45 dev2 [ 2073.285221] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:56 dev2 [ 2073.285857] Stack:
2014 Sep 17 14:35:56 dev2 [ 2073.285864] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.285914] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
客人的配置是典型的,我们有几十个工作正常,我们也有4个这样的客人的KVM主机也没有问题。我应该在哪里挖掘以找到问题的根源?现在没有想法......
主机运行Ubuntu LTS 12.04(Linux vhost12 3.2.0-60-generic#91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU / Linux),guest是一样但是3.2.0- 56通用