Question

我从附近的服务器收到来自有限数量的IP（<4）的大量http GET请求。任务是保持每个请求的响应时间<= 50毫秒。

我已将tcp_tw_reuse设置为1，启用了TCP连接重用。ip_local_port_range设置为1024到65535.

tcp_fin_timeout设置为60（默认值）。

在我的网络服务器配置文件（nginx）中，我已将keepalive_timeout设置为5（这是否与tcp的TIME_WAIT有关？）。

现在，我每秒收到5个请求，响应时间约为200毫秒。

我需要帮助才能对我的响应时间做出一些重大改进（本地计算时间可以忽略不计）。

Answer 1

我要出去猜猜这些是静态文件而你不是通过cgi传递它们。

根据我在剖析和谷歌搜索中的经验，一切都是为了找到瓶颈，或者优化占用时间最多的区域，而不是花费你所有的精力来加快花费5％时间的过程。

我想了解更多有关您的设置的信息。一个文件的响应时间是多少？ ping的回程时间是多少？文件有多大？

例如，如果ping需要150ms，那么问题就在于你的网络，而不是你的nginx conf。如果文件是以兆字节为单位，则不是nginx。

如果响应时间在每秒1到30个请求之间不同，我会认为比nginx更精细的调整更强烈。

你能否对这种情况有所了解？

- 更新 - 我在开箱即用的nginx服务器上做了一个基准测试，得到了一个典型的index.php页面。

从服务器内部进行基准测试时：

roderick@anon-webserver:~$ ab -r -n 1000 -c 100 http://anon.com/index.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking anon.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/0.8.54
Server Hostname:        anon.com
Server Port:            80

Document Path:          /index.php
Document Length:        185 bytes

Concurrency Level:      100
Time taken for tests:   0.923 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Non-2xx responses:      1000
Total transferred:      380000 bytes
HTML transferred:       185000 bytes
Requests per second:    1083.19 [#/sec] (mean)
Time per request:       92.320 [ms] (mean)
Time per request:       0.923 [ms] (mean, across all concurrent requests)
Transfer rate:          401.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    4   1.6      4       9
Processing:     1   43 147.6      4     833
Waiting:        1   41 144.4      3     833
Total:          4   47 148.4      8     842

Percentage of the requests served within a certain time (ms)
  50%      8
  66%      8
  75%      9
  80%      9
  90%     13
  95%    443
  98%    653
  99%    654
 100%    842 (longest request)

从我的家庭桌面进行基准测试时：

roderick@Rod-Dev:~$ ab -r -n 1000 -c 100 http://anon.com/index.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking anon.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/0.8.54
Server Hostname:        anon.com
Server Port:            80

Document Path:          /index.php
Document Length:        185 bytes

Concurrency Level:      100
Time taken for tests:   6.391 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Non-2xx responses:      1000
Total transferred:      380000 bytes
HTML transferred:       185000 bytes
Requests per second:    156.48 [#/sec] (mean)
Time per request:       639.063 [ms] (mean)
Time per request:       6.391 [ms] (mean, across all concurrent requests)
Transfer rate:          58.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       40  260 606.9    137    3175
Processing:    81  214 221.7    140    3028
Waiting:       81  214 221.6    140    3028
Total:        120  474 688.5    277    6171

Percentage of the requests served within a certain time (ms)
  50%    277
  66%    308
  75%    316
  80%    322
  90%    753
  95%    867
  98%   3327
  99%   3729
 100%   6171 (longest request)

我的操作系统是linux，我的cpu是3岁（这是500美元的服务器）。

我在配置文件中没有做过absolutley。

这告诉我什么？ nginx不是问题。

您的服务器网络爆炸或AWS限制您的CPU。我猜两个都可能。

如果修复很重要，我会得到一个专用服务器。但只有我所知的那样。

Answer 2

nginx的{{3}}控制keepalive_timeout，与TCP HTTP protocol's ability to re-use existing connections状态无关。（它与TCP保持活动探测器无关;这些探测器在空闲时间大约两小时后发送，因此对于几乎所有内容都无用。）

HTTP1.1客户端和服务器（以及HTTP1.0客户端和服务器的某些组合）将重用现有连接以请求新数据，这将节省TIME_WAIT所需的时间。如果您的客户端和服务器可以使用它，您可能希望尝试增加此超时值。

TCP TIME_WAIT状态用于确保两个对等体知道死连接已死 - 如果一方在重新连接时重新使用端口而另一方错过< / em> FIN数据包，它可能认为来自新连接的数据包实际上是用于旧连接。哎呀。 TIME_WAIT状态阻止了这种情况。通常没有必要弄乱这个数字;考虑您的客户，每秒连接70次。有63k端口可供选择，端口重用之间约为500秒：63k端口/ 70cps == 1000秒，随机选择可能只有一半。 TIME_WAIT接近两分钟，这是七八分钟。当你从同伴那里获得每秒100个连接时，我开始更担心TIME_WAIT。

相反，我认为你遇到的问题是TCP three-way handshake，用来阻止互联网被一堆愚蠢的小包过度运行。 Nagle的算法会导致TCP系统在发送少量数据时等待一段时间，希望 more 数据在等待时到达。一旦事情开始，这是非常好的，但在启动连接时可能会导致不可接受的延迟。反击Nagle的常用机制是设置TCP_NODELAY套接字选项。（好吧，摆弄应用程序的send(2)和recv(2)调用模式是更好，但并不总是一个选项。因此，这个创可贴。）见{{1} }，tcp(7)联机帮助页以获取详细信息。由于Nagle的算法与Nagle's algorithm交互不良，您还可以要求TCP发送立即 ACK数据包，而不是通常的ACK数据包延迟：这是setsockopt(2)套接字选项

TCP_QUICKACK联机帮助页还提供了tcp(7)可调参数的轻微提示，可以执行您需要的操作。尝试打开它。（我不知道这个决定的后果，但我的猜测会影响Nagle和延迟ACK站点范围，而不是每插槽选项。）

Answer 3

由于客户端主机很少，因此在TIME_WAIT状态下关闭的TCP连接可能会因保留可用端口号而导致速度变慢。

也许你应该尝试：

echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

关于选项的说明：

启用TIME_WAIT套接字的快速回收。建议不要启用此选项，因为这会导致问题使用NAT（网络地址转换）时。

服务器每秒处理70个请求，响应时间小于50毫秒

3 个答案: