Question

设置

好的，我正在Heroku上运行rails应用程序（免费套餐）。

我有2个单独的应用程序版本，让我们称之为 Staging 和 Fake-Production 。

在 Staging 中，我使用Webbrick作为服务器。我的Procfile是

web: rails s -p $PORT

在 Fake-Production 中，我使用Puma作为服务器。我的Procfile是

bundle exec puma -C config/puma.rb

我已经配置了puma来运行2个worker和每个worker 1个线程。 config/puma.rb定义如下（取自Heroku's Setting up Puma Webserver）

workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 1)
threads threads_count, threads_count

preload_app!

rackup      DefaultRackup
port        ENV['PORT']     || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-    with-the-puma-web-server#on-worker-boot
  ActiveRecord::Base.establish_connection
end

我的database.yml配置为连接池为20。

测试

为了进行负载测试，我使用笔记本电脑上的ApacheBench工具来访问API端点。 API基本上做了一个非常简单的数据库查询，以返回固定数量的记录（不会改变）。

我使用以下代码点击了两个部署：

ab -n 1000 -c 100 https://<some heroku endpoint>?access_token=f73f50514c

结果

这里的结果是最令人惊讶的。我期待Puma的部署能够完全破坏Webbrick的部署，但实际上，它几乎是一样的。我尝试了不同的API端点以及Puma工作者和线程的不同组合（在某一点上，它是4个工作者和5个线程）然而没有任何明显的改进。

Webbrick结果

Server Software:        WEBrick/1.3.1
Server Hostname:        webbrick-build.herokuapp.com
Server Port:            443
SSL/TLS Protocol:       TLSv1,DHE-RSA-AES128-SHA,2048,128

Document Path:          /api/v1/packages?access_token=f73f50514c6
Document Length:        488 bytes

Concurrency Level:      100
Time taken for tests:   21.484 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      995000 bytes
HTML transferred:       488000 bytes
Requests per second:    46.55 [#/sec] (mean)
Time per request:       2148.360 [ms] (mean)
Time per request:       21.484 [ms] (mean, across all concurrent requests)
Transfer rate:          45.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      714 1242 278.1   1214    2012
Processing:   248  842 493.6    699    2883
Waiting:      247  809 492.3    677    2876
Total:       1072 2085 643.5   1929    4845

Percentage of the requests served within a certain time (ms)
  50%   1929
  66%   2039
  75%   2109
  80%   2168
  90%   2622
  95%   3821
  98%   4473
  99%   4646
 100%   4845 (longest request)

内存影响

source=web.1 dyno=heroku.1234567899 sample#memory_total=198.41MB sample#memory_rss=197.60MB sample#memory_cache=0.30MB sample#memory_swap=0.51MB sample#memory_pgpgin=103879pages sample#memory_pgpgout=53216pages

Puma结果 （无论工人/线程数多少都大致相同）

Server Software:        Cowboy
Server Hostname:        puma-build.herokuapp.com
Server Port:            443
SSL/TLS Protocol:       TLSv1,DHE-RSA-AES128-SHA,2048,128

Document Path:          /api/v1/packages?access_token=fb7168c147adc2ccd83b2
Document Length:        489 bytes

Concurrency Level:      100
Time taken for tests:   23.299 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      943000 bytes
HTML transferred:       489000 bytes
Requests per second:    42.92 [#/sec] (mean)
Time per request:       2329.949 [ms] (mean)
Time per request:       23.299 [ms] (mean, across all concurrent requests)
Transfer rate:          39.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      743 1304 283.9   1287    2092
Processing:   253  951 740.3    684    5353
Waiting:      253  898 729.0    627    5196
Total:       1198 2255 888.0   1995    7426

Percentage of the requests served within a certain time (ms)
  50%   1995
  66%   2085
  75%   2213
  80%   2444
  90%   3755
  95%   4238
  98%   5119
  99%   5437
 100%   7426 (longest request)

内存影响（4名工作人员，5个线程）

source=web.1 dyno=heroku.1234567890 sample#memory_total=406.75MB sample#memory_rss=406.74MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=151515pages sample#memory_pgpgout=47388pages

根据上面的代码段，有时Puma部署会比Webbrick更快，而有时候它可能会更慢（如代码段所示）。即使速度更快，速度也不会很快，可能只会增加1-5个请求/秒。

我的问题是，我做错了什么？我的数据库池有问题吗？我错误地对它进行了基准测试吗？我错误地使用了Puma吗？

编辑：

Puma的最高CPU负载（5个工作线程和5个线程）

source=web.1 dyno=heroku.123456789 sample#load_avg_1m=2.98

然而，大部分时间，它都是0.00或小于0.1。

最重要的是，控制器中调用的唯一代码是：

@package = Package.all

紧接着，然后呈现在HAML中声明的JSON响应。

顺便说一下，Package.all只返回大约5条记录。

编辑2：

UNICORN RESULTS

根据实施独角兽。跑3名麒麟工人。

Server Software:        Cowboy
Server Hostname:        unicorn-build.herokuapp.com
Server Port:            443
SSL/TLS Protocol:       TLSv1,DHE-RSA-AES128-SHA,2048,128

Document Path:          /api/v1/packages?access_token=f73f50514c6b8a3ea
Document Length:        488 bytes

Concurrency Level:      100
Time taken for tests:   22.311 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      942000 bytes
HTML transferred:       488000 bytes
Requests per second:    44.82 [#/sec] (mean)
Time per request:       2231.135 [ms] (mean)
Time per request:       22.311 [ms] (mean, across all concurrent requests)
Transfer rate:          41.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      846 1326 294.5   1304    2720
Processing:   245  627 342.8    540    3061
Waiting:      244  532 313.6    470    3057
Total:       1232 1954 463.0   1874    4875

Percentage of the requests served within a certain time (ms)
  50%   1874
  66%   2016
  75%   2161
  80%   2250
  90%   2466
  95%   2799
  98%   3137
  99%   3901
 100%   4875 (longest request)

我注意到的一件事是，多次运行相同的ab负载测试代码将返回不同的“每秒请求数”。这适用于Unicorn和Puma。对于Unicorn和Puma来说，最好的“每秒请求数”大约为48-50，而最差的大约是25-33。

无论哪种方式，它仍然没有意义。为什么彪马或独角兽都不会粉碎Webbrick？

Answer 1

我相信你已经彻底关注了Heroku的Deploying Rails Applications with the Puma Web Server指南。

我的猜测是，您的测试环境可以最大限度地减少多线程优势，或者SQL服务器由SQL数据库组成瓶颈。

您的API调用（特别是在缓存数据库结果时）可能会占用大量CPU资源。当CPU仅100％使用时，拥有10个线程是没有优势的。在这种情况下，管理线程实际上会阻碍性能。

当工作线程长时间等待资源（数据库，文件等）而不是使用CPU时，多线程很有用。

第二种可能性是您的HTTP服务器受数据库约束。可能是WEBrick的移动速度与数据库允许的速度一样快，通过切换到性能更好的HTTP服务器，没有任何改进的余地。

您应该阅读this综合基准报告。

您会注意到Puma不是最快的Rails HTTP服务器之一。如果你关心的只是速度，请尝试Unicorn或Torquebox 4如果使用 JRuby 。

这里有关于如何在Heroku上设置Unicorn的guide。

Puma或Unicorn VS Webbrick负载测试基准测试显示没有任何改进

1 个答案: