我正在尝试使用apache bench在我们的API服务器上进行一些非常简单的基准测试,我知道它不是加载测试web服务器的理想工具,但这不是重点。
我们有几台服务器和几台虚拟机(托管在这些服务器上)。所有这些都运行相同的nginx,uwsgi和一个cherrypy webapp堆栈。我可以在新服务器(没有VM)上使用apachebench,结果如下:
Server Software: CherryPy/3.2.2
Server Hostname: 10.0.2.4
Server Port: 5000
Document Path: /api/v4/ping
Document Length: 245 bytes
Concurrency Level: 1
Time taken for tests: 10.100 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 397000 bytes
HTML transferred: 245000 bytes
Requests per second: 99.01 [#/sec] (mean)
Time per request: 10.100 [ms] (mean)
Time per request: 10.100 [ms] (mean, across all concurrent requests)
Transfer rate: 38.38 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 4 54.7 1 1002
Processing: 3 6 2.0 6 34
Waiting: 3 6 2.0 6 34
Total: 4 10 54.8 7 1008
Percentage of the requests served within a certain time (ms)
50% 7
66% 8
75% 8
80% 8
90% 9
95% 9
98% 11
99% 12
100% 1008 (longest request)
这里奇怪的是,正如您所见,请求大多在12毫秒内提供。但有些请求的使用时间要长十倍。显然问题出在'连接'阶段,不管是什么。
我可以从任何服务器或VM查询任何其他服务器或VM,可靠地重现此效果。但是,针对localhost的查询总是很快。
此外,慢速请求似乎是均匀分布的,即使用ab(并发设置1)只执行50个请求,我得到一些缓慢的请求(有时需要几次尝试)。
为了进一步查明问题,我知道只需对nginx的hello world页面进行基准测试,该页面显示完全相同的行为(因此它不是我们的应用程序)。
当我第一次注意到这个问题时,我也注意到了大量的'TIME_WAIT'连接,我尝试调整一些sysctl标志,TIME_WAIT现在要少得多,但是问题仍然存在。
我尝试过各种各样的sysctl设置都无济于事。
所有服务器都是带有Intel处理器的中等新机器,它们都运行Ubuntu服务器,但版本不同:10.10,12.10和13.04。
为了完整起见,我尝试在两台Google Compute Engine虚拟机上针对nginx的首页运行相同的基准测试,它们不会显示相同的行为。
任何帮助将不胜感激,如果缺少任何信息,我很乐意修改我的帖子:)。
编辑:
在使用ntpdate同步时钟后,我运行了一个Python脚本,它只是打开一个从host-3到host-4的套接字并再次关闭它。如果过程花费的时间超过0.5秒,则会记录确切的时间范围。我在两台服务器上运行tcpdump时都运行了这个,这是一些输出,请注意,在某些时候传输一些数据包大约需要一秒钟。问题是,我真的不知道如何解释这些结果。
主持人3:
13:24:59.591046 IP (tos 0x0, ttl 64, id 17736, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-3.45573 > sm-host-4.5000: Flags [S], cksum 0x3b59 (incorrect -> 0x69fa), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581318 ecr 90980708,nop,wscale 7], length 0
13:24:59.591110 IP (tos 0x0, ttl 60, id 2866, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-4.5000 > sm-host-3.45572: Flags [F.], cksum 0x4d59 (correct), seq 1, ack 2, win 114, options [nop,nop,TS val 90980709 ecr 406581318], length 0
13:24:59.591127 IP (tos 0x0, ttl 64, id 30162, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-3.45572 > sm-host-4.5000: Flags [.], cksum 0x3b51 (incorrect -> 0x4d59), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980709], length 0
13:25:00.588453 IP (tos 0x0, ttl 64, id 17737, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-3.45573 > sm-host-4.5000: Flags [S], cksum 0x3b59 (incorrect -> 0x6900), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581568 ecr 90980708,nop,wscale 7], length 0
13:25:00.588766 IP (tos 0x0, ttl 60, id 0, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-4.5000 > sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0
13:25:00.588814 IP (tos 0x0, ttl 64, id 17738, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-3.45573 > sm-host-4.5000: Flags [.], cksum 0x3b51 (incorrect -> 0xa758), seq 1, ack 1, win 114, options [nop,nop,TS val 406581568 ecr 90980958], length 0
13:25:00.588821 IP (tos 0x0, ttl 60, id 0, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-4.5000 > sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0
主机-4(大约在同一时间范围内):
13:24:59.587624 IP (tos 0x0, ttl 60, id 11, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-3.45569 > sm-host-4.5000: Flags [.], cksum 0x54c4 (correct), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980708], length 0
13:24:59.587630 IP (tos 0x0, ttl 60, id 30162, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-3.45572 > sm-host-4.5000: Flags [.], cksum 0x4d59 (correct), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980709], length 0
13:25:00.585005 IP (tos 0x0, ttl 60, id 17736, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-3.45573 > sm-host-4.5000: Flags [S], cksum 0x69fa (correct), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581318 ecr 90980708,nop,wscale 7], length 0
13:25:00.585026 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-4.5000 > sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0
13:25:00.585032 IP (tos 0x0, ttl 60, id 17737, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-3.45573 > sm-host-4.5000: Flags [S], cksum 0x6900 (correct), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581568 ecr 90980708,nop,wscale 7], length 0
13:25:00.585039 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
sm-host-4.5000 > sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0
13:25:00.585331 IP (tos 0x0, ttl 60, id 17738, offset 0, flags [DF], proto TCP (6), length 52)
sm-host-3.45573 > sm-host-4.5000: Flags [.], cksum 0xa758 (correct), seq 1, ack 1, win 114, options [nop,nop,TS val 406581568 ecr 90980958], length 0