由于Phusion Passenger队列被阻止而导致重载Web服务

时间:2015-04-04 03:41:39

标签: ruby-on-rails mongodb nginx passenger sidekiq

我们正在使用Ruby 2 on Rails 4,Mongoid 4,MongoDB 2.6开发Web服务。它使用Sidekiq 3.3.0和Redis 2.8,并在Phusion Passenger 5.0.4 + Nginx 1.7.10上运行。它只为移动客户提供服务。 AngularJS Web客户端通过JSON API。

通常一切正常,API处理并在1秒内响应。但是在高峰时段,服务负载很重(API呈现为503 Service Unavailable)。以下是我们的Nginx& Mongoid配置:

Nginx config

passenger_root /home/deployer/.rvm/gems/ruby-2.1.3/gems/passenger-4.0.53;
#passenger_ruby /usr/bin/ruby;
passenger_max_pool_size 70;
passenger_min_instances 1;
passenger_max_requests 20; # A workaround if apps are mem-leaking
passenger_pool_idle_time 300;
passenger_max_instances_per_app 30;
passenger_pre_start http://production_domain/;

## Note: there're 2 apps with the same config
server {
  listen 80;
  server_name production_domain;
  passenger_enabled on;
  root /home/deployer/app_name-production/current/public;

  more_set_headers 'Access-Control-Allow-Origin: *'
  more_set_headers 'Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE, HEAD';
  more_set_headers 'Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';

  if ($request_method = 'OPTIONS') {
    # more_set_headers 'Access-Control-Allow-Origin: *';
    # add_header 'Access-Control-Allow-Origin' '*';
    # add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
    # add_header 'Access-Control-Allow-Headers' 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,X-FooA$
    # add_header 'Access-Control-Max-Age' 1728000;
    # add_header 'Content-Type' 'text/plain charset=UTF-8';
    # add_header 'Content-Length' 0;
    return 200;
  }

  access_log /var/log/nginx/app_name-production.access.log;
  error_log /var/log/nginx/app_name-production.error.log;

  error_page 404 /404.html;
  error_page 500 502 503 504 /50x.html;
  location = /50x.html {
    root /etc/nginx/html/;
  }
  rails_env production;
}

Mongoid config

development:
  sessions:
    default:
      another:
      uri: mongodb://127.0.0.1:27017/database_name
test:
  sessions:
    default:
      another:
      uri: mongodb://127.0.0.1:27017/database_name
      options:
        pool_size: 10
        pool_timeout: 15
        retry_interval: 1
        max_retries: 30
        refresh_interval: 10
        timeout: 15
staging:
  sessions:
    default:
      another:
      uri: mongodb://staging_domain/staging_database
      options:
        pool_size: 10
        pool_timeout: 15
        retry_interval: 1
        max_retries: 30
        refresh_interval: 10
        timeout: 15
production:
  sessions:
    default:
      another:
      uri: mongodb://production_domain/production_database
      options:
        pool_size: 30
        pool_timeout: 15
        retry_interval: 1
        max_retries: 30
        refresh_interval: 10
        timeout: 15

Sidekiq config

重载时

和乘客日志:

Version : 5.0.4
Date    : 2015-04-04 09:31:14 +0700
Instance: MxPcaaBy (nginx/1.7.10 Phusion_Passenger/5.0.4)

----------- General information -----------
Max pool size : 120
Processes     : 62
Requests in top-level queue : 0

----------- Application groups -----------
/home/deployer/memo_rails-staging/current/public (staging)#default:
  App root: /home/deployer/memo_rails-staging/current
  Requests in queue: 0
  * PID: 20453   Sessions: 0       Processed: 639     Uptime: 14h 34m 26s
    CPU: 0%      Memory  : 184M    Last used: 14s ago
  * PID: 402     Sessions: 0       Processed: 5       Uptime: 13h 0m 42s
    CPU: 0%      Memory  : 171M    Last used: 23m 35s
  * PID: 16081   Sessions: 0       Processed: 3       Uptime: 10h 26m 9s
    CPU: 0%      Memory  : 163M    Last used: 24m 9s a
  * PID: 30300   Sessions: 0       Processed: 1       Uptime: 4h 19m 43s
    CPU: 0%      Memory  : 164M    Last used: 24m 15s

/home/deployer/memo_rails-production/current/public (production)#default:
  App root: /home/deployer/memo_rails-production/current
  Requests in queue: 150
  * PID: 25924   Sessions: 1       Processed: 841     Uptime: 20m 49s
    CPU: 3%      Memory  : 182M    Last used: 7m 58s ago
  * PID: 25935   Sessions: 1       Processed: 498     Uptime: 20m 49s
    CPU: 2%      Memory  : 199M    Last used: 5m 40s ago
  * PID: 25948   Sessions: 1       Processed: 322     Uptime: 20m 49s
    CPU: 1%      Memory  : 200M    Last used: 7m 57s ago
  * PID: 25960   Sessions: 1       Processed: 177     Uptime: 20m 49s
    CPU: 0%      Memory  : 158M    Last used: 19s ago
  * PID: 25972   Sessions: 1       Processed: 115     Uptime: 20m 48s
    CPU: 0%      Memory  : 151M    Last used: 7m 56s ago
  * PID: 25987   Sessions: 1       Processed: 98      Uptime: 20m 48s
    CPU: 0%      Memory  : 179M    Last used: 7m 56s ago
  * PID: 25998   Sessions: 1       Processed: 77      Uptime: 20m 48s
    CPU: 0%      Memory  : 145M    Last used: 7m 2s ago
  * PID: 26012   Sessions: 1       Processed: 97      Uptime: 20m 48s
    CPU: 0%      Memory  : 167M    Last used: 19s ago
  * PID: 26024   Sessions: 1       Processed: 42      Uptime: 20m 47s
    CPU: 0%      Memory  : 148M    Last used: 7m 55s ago
  * PID: 26038   Sessions: 1       Processed: 44      Uptime: 20m 47s
    CPU: 0%      Memory  : 164M    Last used: 1m 0s ago
  * PID: 26050   Sessions: 1       Processed: 29      Uptime: 20m 47s
    CPU: 0%      Memory  : 142M    Last used: 7m 54s ago
  * PID: 26063   Sessions: 1       Processed: 41      Uptime: 20m 47s
    CPU: 0%      Memory  : 168M    Last used: 1m 1s ago
  * PID: 26075   Sessions: 1       Processed: 23      Uptime: 20m 47s
    CPU: 0%      Memory  : 126M    Last used: 7m 51s ago
  * PID: 26087   Sessions: 1       Processed: 19      Uptime: 20m 46s
    CPU: 0%      Memory  : 120M    Last used: 7m 50s ago
  * PID: 26099   Sessions: 1       Processed: 37      Uptime: 20m 46s
    CPU: 0%      Memory  : 131M    Last used: 7m 3s ago
  * PID: 26111   Sessions: 1       Processed: 20      Uptime: 20m 46s
    CPU: 0%      Memory  : 110M    Last used: 7m 49s ago
  * PID: 26126   Sessions: 1       Processed: 28      Uptime: 20m 46s
    CPU: 0%      Memory  : 172M    Last used: 1m 56s ago
  * PID: 26141   Sessions: 1       Processed: 20      Uptime: 20m 45s
    CPU: 0%      Memory  : 107M    Last used: 7m 19s ago
  * PID: 26229   Sessions: 1       Processed: 20      Uptime: 20m 21s
    CPU: 0%      Memory  : 110M    Last used: 11s ago
  * PID: 26241   Sessions: 1       Processed: 9       Uptime: 20m 21s
    CPU: 0%      Memory  : 105M    Last used: 7m 47s ago
  * PID: 26548   Sessions: 1       Processed: 23      Uptime: 19m 14s
    CPU: 0%      Memory  : 125M    Last used: 7m 44s ago
  * PID: 27465   Sessions: 1       Processed: 30      Uptime: 15m 23s
    CPU: 0%      Memory  : 109M    Last used: 2m 22s ago
  * PID: 27501   Sessions: 1       Processed: 28      Uptime: 15m 18s
    CPU: 0%      Memory  : 117M    Last used: 7m 15s ago
  * PID: 27511   Sessions: 1       Processed: 34      Uptime: 15m 17s
    CPU: 0%      Memory  : 144M    Last used: 5m 40s ago
  * PID: 27522   Sessions: 1       Processed: 30      Uptime: 15m 17s
    CPU: 0%      Memory  : 110M    Last used: 26s ago
  * PID: 27533   Sessions: 1       Processed: 38      Uptime: 15m 17s
    CPU: 0%      Memory  : 110M    Last used:"4m 44s ago
  * PID: 27555   Sessions: 1       Processed: 27      Uptime: 15m 15s
    CPU: 0%      Memory  : 120M    Last used: 1m 29s ago
  * PID: 27570   Sessions: 1       Processed: 21      Uptime: 15m 14s
    CPU: 0%      Memory  : 107M    Last used: 7m 1s ago
  * PID: 27590   Sessions: 1       Processed: 8       Uptime: 15m 13s
    CPU: 0%      Memory  : 105M    Last used: 7m 34s ago
  * PID: 27599   Sessions: 1       Processed: 13      Uptime: 15m 13s
    CPU: 0%      Memory  : 107M    Last used: 7m 0s ago
  * PID: 27617   Sessions: 1       Processed: 26      Uptime: 15m 12s
    CPU: 0%      Memory  : 114M    Last used: 4m 49s ago
  * PID: 27633   Sessions: 1       Processed: 19      Uptime: 15m 11s
    CPU: 0%      Memory  : 137M    Last used: 1m 14s ago
  * PID: 27643   Sessions: 1       Processed: 15      Uptime: 15m 11s
    CPU: 0%      Memory  : 132M    Last used: 6m 19s ago
  * PID: 27661   Sessions: 1       Processed: 23      Uptime: 15m 10s
    CPU: 0%      Memory  : 112M    Last used: 9s ago
  * PID: 27678   Sessions: 1       Processed: 24      Uptime: 15m 9s
    CPU: 0%      Memory  : 108M    Last used: 6m 53s ago
  * PID: 27692   Sessions: 1       Processed: 9       Uptime: 15m 9s
    CPU: 0%      Memory  : 105M    Last used: 7m 22s ago
  * PID: 28400   Sessions: 1       Processed: 19      Uptime: 12m 45s
    CPU: 0%      Memory  : 111M    Last used: 1m 25s ago
  * PID: 28415   Sessions: 1       Processed: 26      Uptime: 12m 45s
    CPU: 0%      Memory  : 149M    Last used: 3m 45s ago
  * PID: 28439   Sessions: 1       Processed: 14      Uptime: 12m 44s
    CPU: 0%      Memory  : 106M    Last used: 59s ago
  * PID: 28477   Sessions: 1       Processed: 12      Uptime: 12m 42s
    CPU: 0%      Memory  : 108M    Last used: 1m 34s ago
  * PID: 28495   Sessions: 1       Processed: 14      Uptime: 12m 41s
    CPU: 0%      Memory  : 108M    Last used: 18s ago
  * PID: 29315   Sessions: 1       Processed: 7       Uptime: 10m 1s
    CPU: 0%      Memory  : 107M    Last used: 7m 0s ago
  * PID: 29332   Sessions: 1       Processed: 13      Uptime: 10m 0s
    CPU: 0%      Memory  : 108M    Last used: 5m 39s ago
  * PID: 29341   Sessions: 1       Processed: 7       Uptime: 10m 0s
    CPU: 0%      Memory  : 105M    Last used: 6m 53s ago
  * PID: 29353   Sessions: 1       Processed: 11      Uptime: 10m 0s
    CPU: 0%      Memory  : 119M    Last used: 5m 4s ago
  * PID: 29366   Sessions: 1       Processed: 16      Uptime: 9m 59s
    CPU: 0%      Memory  : 119M    Last used: 3m 13s ago
  * PID: 29377   Sessions: 1       Processed: 10      Uptime: 9m 59s
    CPU: 0%      Memory  : 113M    Last used: 1m 34s ago
  * PID: 29388   Sessions: 1       Processed: 2       Uptime: 9m 59s
    CPU: 0%      Memory  : 97M     Last used: 7m 28s ago
  * PID: 29400   Sessions: 1       Processed: 6       Uptime: 9m 59s
    CPU: 0%      Memory  : 103M    Last used: 6m 53s ago
  * PID: 29422   Sessions: 1       Processed: 17      Uptime: 9m 58s
    CPU: 0%      Memory  : 132M    Last used: 1m 24s ago
  * PID: 29438   Sessions: 1       Processed: 1       Uptime: 9m 57s
    CPU: 0%      Memory  : 96M     Last used: 6m 52s ago
  * PID: 29451   Sessions: 1       Processed: 21      Uptime: 9m 56s
    CPU: 0%      Memory  : 133M    Last used: 2m 10s ago
  * PID: 29463   Sessions: 1       Processed: 19      Uptime: 9m 56s
    CPU: 0%      Memory  : 111M    Last used: 27s ago
  * PID: 29477   Sessions: 1       Processed: 23      Uptime: 9m 56s
    CPU: 0%      Memory  : 117M    Last used: 14s ago
  * PID: 30625   Sessions: 1       Processed: 7       Uptime: 6m 49s
    CPU: 0%      Memory  : 106M    Last used: 1m 21s ago
  * PID: 30668   Sessions: 1       Processed: 2       Uptime: 6m 44s
    CPU: 0%      Memory  : 105M    Last used: 1m 13s ago
  * PID: 30706   Sessions: 1       Processed: 16      Uptime: 6m 43s
    CPU: 0%      Memory  : 148M    Last used: 1m 11s ago
  * PID: 30718   Sessions: 1       Processed: 12      Uptime: 6m 43s
    CPU: 0%      Memory  : 112M    Last used: 1m 16s ago

我有一些问题:

  1. 似乎是网络连接速度慢的人正在请求我们的服务,导致Passenger进程被阻止。我们必须重新启动Nginx以使Web服务再次运行。有人有这方面的经验吗?
  2. 我们还使用Sidekiq作为工作队列。我们的大多数工人都是在没有遇到MongoDB的情况下实施的。他们工作正常。
  3. 但我们使用2名工作人员来更新用户的数据,查询,更新和更新将数据插入数据库。我们尝试使用MongoDB批量命令(update& insert)优化这些所有任务。

    通常,当少量用户请求Web服务时,Workers工作正常,繁忙的队列在大约1分钟内处理,但是当它收到更多请求时,忙队列会阻塞整个系统。我们必须重新启动Nginx以使其正常工作。以下是Sidekiq配置:

    development:
      :concurrency: 5
      :logfile: ./log/sidekiq_development.log
      :pidfile: ./log/sidekiq.pid
    staging:
      :concurrency: 5
      :logfile: ./log/sidekiq_staging.log
      :pidfile: ./log/sidekiq.pid
    production:
      :concurrency: 15
      :logfile: ./log/sidekiq_production.log
      :pidfile: ./log/sidekiq.pid
    :queues:
      - ...
    

    我们对这些问题没有任何经验。有人有什么想法吗?

    更新1:

    在服务器负载很重的一些监控之后,我们得到了这样的结果:MongoDB进程有很多故障&堆叠的读取队列,下面是在停机期间记录的mongostat

    insert  query update delete getmore command flushes mapped  vsize    res faults      locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
        *0      2     *0     *0       0     4|0       0    79g   160g  3.36g    137   memo_v2:2.6%          0      17|0     8|0    36k     8k    61   15:05:22
        *0      6     *0     *0       0     1|0       0    79g   160g  3.38g    144   memo_v2:2.1%          0      30|0     3|0   722b    11k    61   15:05:23
      1595     15      1     *0       0     5|0       0    79g   160g  3.41g    139  memo_v2:19.7%          0      20|0     8|0   164k   179k    61   15:05:25
         1     18      2     *0       1     6|0       0    79g   160g  3.38g    198  memo_v2:14.4%          0      31|0     1|0     3k   122k    61   15:05:26
         2     20      4     *0       0     7|0       0    79g   160g  3.38g    169   memo_v2:8.6%          0      29|0     1|0     3k   157k    61   15:05:27
         1      6     23     *0       0     4|0       0    79g   160g  3.39g    190  memo_v2:18.7%          0      32|0     1|0     1k    63k    61   15:05:28
         1      4     42     *0       0     4|0       0    79g   160g   3.1g    115  memo_v2:35.9%          0      30|0     0|1     1k    20k    61   15:05:29
         1      5     51     *0       0     4|0       0    79g   160g  3.11g    177  memo_v2:30.0%          0      28|0     1|0     1k    23k    61   15:05:30
        *0      6     20     *0       0     2|0       0    79g   160g  3.12g    174  memo_v2:40.9%          0      28|0     1|0    15k     7k    61   15:05:31
         2      9     *0     *0       1     7|0       0    79g   160g   3.1g    236   memo_v2:4.4%          0      26|0     2|0     2k    31k    61   15:05:32
    

    之前有人遇到过这个问题吗?

1 个答案:

答案 0 :(得分:1)

我没有足够的声誉发表评论,所以我必须添加一个非常黯淡的答案。

我对堆栈没有任何经验,但如果你说错误的客户是乘客问题的原因,那么我建议你确保在乘客面前有足够的缓冲过程

对于nginx,重要的设置看起来是proxy_buffers。以下文章中标题为“使用缓冲区释放后端服务器”的部分讨论了nginx模块:https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching

对于mongodb问题,听起来你只需要深入挖掘一下。如果您可以找到代码中发生问题的位置,那么解决方案可能会出现。由Hongli链接的文章看起来非常好。