Question

I am currently running a Flask + SQLAlchemy + uWSGI + nginx stack for an API backend that I am developing for my application. I was attempting to see what is the maximum amount of concurrent connection I can have on my server by using ApacheBench and sending different amounts of concurrent requests to an endpoint on the server.

This endpoint would take a JSON request body, extract certain values, run a query, then return a JSON response based on the results of the query.

I ran a base test of 1 concurrent request for 10 requests, and I got an average response time of 60 milliseconds.
Running another test with 10 concurrent requests for 100 requests returned an average of 150ms, 100 concurrent requests for 1000 returned 1500ms and 500 concurrent requests returned around 7000-9000ms.

Concurrency Level:      500
Time taken for tests:   38.710 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1310000 bytes
Total body sent:        1105000
HTML transferred:       110000 bytes
Requests per second:    129.17 [#/sec] (mean)
Time per request:       3870.986 [ms] (mean)
Time per request:       7.742 [ms] (mean, across all concurrent requests)
Transfer rate:          33.05 [Kbytes/sec] received
                        27.88 kb/s sent
                        60.93 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       24   63 185.1     25    3025
Processing:   161 3737 2778.7   3316   26719
Waiting:      157 3737 2778.7   3316   26719
Total:        187 3800 2789.7   3366   26744

Percentage of the requests served within a certain time (ms)
  50%   3366
  66%   4135
  75%   4862
  80%   5711
  90%   7449
  95%   9158
  98%  11794
  99%  13373
 100%  26744 (longest request)

The latency seems to be linearly increasing which makes sense, but it seemed to increase TOO quickly. After doing a lot of tinkering and profiling, I found that the bottleneck was the queries.

At the start of the benchmark, the queries would process and return quickly under 10-50ms, but it quickly increased and in some cases latencies of 10000-15000 ms were being seen.

I couldn't figure out why the db was slowing down so much especially since it is empty (barring test data).

I tried running the application WITHOUT a connection pool, and the results showed that latency went down (7-9s to 5-6s). I did not think this was possible because everything I read suggested having a connection pool will always make things faster because you avoid the overhead of establishing a new connection every time you make a request.

I also experimented with INCREASING the connection pool size (from the default 5 to 50), and that decreased latency even more than the pool-less setup(5-6s to 3-4s).

Concurrency Level:      500
Time taken for tests:   4.842 seconds
Complete requests:      836
Failed requests:        0
Non-2xx responses:      679
Total transferred:      272673 bytes
Total body sent:        294593
HTML transferred:       126353 bytes
Requests per second:    172.67 [#/sec] (mean)
Time per request:       2895.662 [ms] (mean)
Time per request:       5.791 [ms] (mean, across all concurrent requests)
Transfer rate:          55.00 [Kbytes/sec] received
                        59.42 kb/s sent
                        114.42 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       24  170 273.1     93    1039
Processing:    25  665 1095.2    149    4753
Waiting:       25  665 1095.2    149    4753
Total:         51  835 1123.9    279    4786

Percentage of the requests served within a certain time (ms)
  50%    279
  66%    487
  75%   1050
  80%   1059
  90%   2935
  95%   3658
  98%   4176
  99%   4337
 100%   4786 (longest request)

The latency is still extremely high (3-4 seconds for an API seems unreasonable by any standard), and I am trying to figure out how I can decrease it even more. Is the answer just more connections?

Note: I am running 4 uWSGI processes with 100 threads each on a server with 4GB ram and a quad core processor. I am using the psycopg2-cffi adapter for the connection, and the application is running on PyPy.

Answer 1

如果数据库必须按顺序处理您的查询，则线性增加非常正常。本质上，所有并发请求都是在同一时间启动，但是一个接一个地完成，因此，假设一个池具有单个连接，每个请求60ms，以及10个并发请求，您将看到请求需要60ms，120ms，180ms ，240ms，300ms，360ms，420ms，480ms，540ms，600ms，600ms，...，600ms，540ms，480ms，.......在给定n个请求和m并发请求的情况下，我们可以计算平均请求所需的时间：

f(n, m) = 60ms * (((m + 1) * m / 2) * 2 + (n - 2m) * m) / n
f(100, 10) = 546ms
f(1000, 100) = 5406ms
f(1000, 500) = 15,030ms

这些数字与您所看到的类似。

现在出现了一个大问题。为什么数据库几乎按顺序处理查询？我可以想到几个原因：

锁定：每个启动的事务都需要专门锁定一些资源，因此一次只能运行一个（或几个）事务
CPU绑定查询：每个事务占用大量CPU资源，因此其他事务必须等待CPU时间
大型表扫描：数据库无法将整个表保留在内存中，因此必须从磁盘读取每个事务

你如何解决这个问题？这是一个复杂的问题，但有一些潜在的解决方案：

优化您的查询;或者优化它，以便它们不会为同一资源而战，或者优化它以便它们不会花费很长时间
批量查询，以便总共需要少量运行
缓存您的回复，以便他们根本不会访问数据库

Improving performance for SQLAlchemy?

1 个答案: