捕获TERM并发送QUIT后,Heroku上的Unicorn退出超时

时间:2013-07-03 14:27:48

标签: heroku unicorn

我正在接收运行独角兽和sidekiq的Heroku应用程序的R12退出超时错误。这些错误每天发生1-2次,每当我部署时。据我所知,我需要转换来自Heroku的关机信号,以便独角兽正确响应,但我认为我已经在下面的独角兽配置中这样做了:

worker_processes 3
timeout 30
preload_app true

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
    Rails.logger.info('Disconnected from ActiveRecord')
  end
end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
    Rails.logger.info('Connected to ActiveRecord')
  end

  Sidekiq.configure_client do |config|
    config.redis = { :size => 1 }
  end
end

我的错误日志如下所示:

Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137

似乎所有子进程都在超时之前成功获得。主人还活着吗?此外,路由器是否仍然在关闭期间向dyno发送Web请求,如日志所示?

FWIW,我正在使用Heroku的零停机部署插件(https://devcenter.heroku.com/articles/labs-preboot/)。

1 个答案:

答案 0 :(得分:4)

我认为您的自定义信号处理是导致超时的原因。

编辑:我因为不同意Heroku的文档而遭到抨击,我想解决这个问题。

配置您的Unicorn应用程序以捕获并吞下TERM信号是导致应用程序挂起且未正确关闭的最可能原因。

Heroku似乎认为将 TERM 信号捕获并转换为 QUIT 信号是将硬关闭转变为正常关闭的正确行为。

然而,这样做似乎在某些情况下会引入不关闭的风险 - 这个bug的根源。遇到悬挂式dynos运行Unicorn的用户应该考虑证据并根据第一原则做出自己的决定,而不仅仅是文档。