如何使Bluepill重新启动只有在达到安全状态后才能恢复工作

时间:2013-08-24 23:45:23

标签: resque bluepill

让我们说这是我的工作人员:

class FooWorker
  @queue = :foo

  def self.perform
    User.all.each do |u|
      ...
      Do Long Operations (Unsafe to kill)
      ...

      # Here it's safe to break the worker and restart
    end
  end
end

我正在使用Resque Scheduler对此进行排队,这是我的Bluepill conf:

...
app.process(process_name) do |process|
  process.group         = "resque"
  process.start_command = "rake environment resque:work QUEUE=foo RAILS_ENV=production"
  ...
  process.stop_signals  = [:quit, 5.seconds, :term, 1.minute, :kill]
  process.daemonize     = true

  process.start_grace_time = 30.seconds
  process.stop_grace_time  = 80.seconds

  process.monitor_children do |child_process|
    child_process.stop_command = "kill -QUIT {{PID}}"

    child_process.checks :mem_usage, :every => 30.seconds, :below => 500.megabytes, :times => [3,4], :fires => :stop
  end
end
....

我想让Bluepill或Resque等到“安全”块重启或关闭。怎么做到这一点?

1 个答案:

答案 0 :(得分:0)

以这种方式尝试:

1)通过在开始时设置TERM_CHILDRESQUE_TERM_TIMEOUT env变量,设置resque以使用new_kill_child方法在TERM / INT上优雅地杀死孩子:

process.start_command = "rake environment resque:work QUEUE=foo RAILS_ENV=production TERM_CHILD=1 RESQUE_TERM_TIMEOUT=20.0"

RESQUE_TERM_TIMEOUT的默认值为4 seconds

这将使resque向孩子发送TERM信号,等待RESQUE_TERM_TIMEOUT并且如果孩子仍在运行,则将其杀死。一定要

a)将此超时设置得足够大,以便您的关键部分结束,

b)将process.stop_signals中的Bluepill TERM超时配置为比RESQUE_TERM_TIMEOUT稍大,以免在等待子进程结束关键部分时终止工作。

2)处理子进程中的TERM信号以正常停止:

class FooWorker
  class << self
    attr_accessor :stop
  end

  @queue = :foo
  def self.perform
    User.all.each do |u|
      ...
      Do Long Operations (Unsafe to kill)
      ...

      # Here it's safe to break the worker and restart
      return if FooWorker.stop
    end
  end
end

trap('TERM') do
  FooWorker.stop = true
end