我们正在通过monit监控sidekiq流程。一旦sidekiq进程到达大约2 GB的内存,我们就会重新启动该进程。我们有使用90秒的tiemout定义的启动和停止程序。但是停止程序失败了(等待超时90秒后)。
这是示例监控配置。
check process sidekiq
with pidfile /pathtopidfile
start program = "/bin/sh -c start sidekiq commmand" with timeout 90 seconds
stop program = "stop sidekiq command" with timeout 90 seconds
if totalmem is greater than 2GB for 3 cycles then restart
***## I need have some condition like this -> if "stop_program failed" then "do some action"***
end
P.S我不知道在monit中捕获停止程序失败的正确语法..我检查了monit博客,但我不能。
答案 0 :(得分:0)
I think no options in monit to capture failure of the stop or start program. So We have to handle those failure cases in our respective program itself. Say if my stop program is getting failed,i have to find why it is getting failed, and take corresponding action in stop program itself.
My original problem was Sidekiq process is not getting killed within the timeout, so stop program got failed. In order to resolve this i have handled in the stop program that if the sidekiq process is not getting killed within the timeout then hard kill the process.