我正在使用monit来监控我的程序。受监控的程序可能会在2种情况下崩溃
为了解决后一种情况,我有一个脚本来停止程序,通过清理其数据文件并重新启动它将其重置为良好状态。我尝试了下面的配置
check process program with pidfile program.pid
start program = "programStart" as uid username and gid groupname
stop program = "programStop" as uid username and gid groupname
if 3 restarts within 20 cycles then exec "cleanProgramAndRestart" as uid username and gid groupname
if 6 restarts within 20 cycles then timeout
假设monit在3个周期内重启程序3次。第三次重启后, cleanProgramAndRestart 脚本运行。但是,当cleanProgramAndRestart脚本再次重新启动程序时,在下一个循环中再次满足3次重启的条件并且它变为无限循环
有人可以建议任何解决方法吗?
如果可以采取以下任何行动,那么可能有办法解决。
答案 0 :(得分:2)
Monit正在调查你的"测试"每个周期。周期长度通常在/etc/monitrc
1}中的set daemon cycle_length
中定义
因此,如果您的cleanProgramAndRestart
执行时间不到一个周期,则不应该发生。
正如它发生的那样,我猜你的cleanProgramAndRestart
需要不止一个周期才能完成。
你可以:
如果您无法修改这些变量,可能会有一些解决方法,使用临时文件:
check process program
with pidfile program.pid
start program = "programStart"
as uid username and gid groupname
stop program = "programStop"
as uid username and gid groupname
if 3 restarts within 20 cycles
then exec "touch /tmp/program__is_crashed"
if 6 restarts within 20 cycles then timeout
check file program_crash with path /tmp/program_crash every x cycles #(make sure that cycle_length*x > cleanProgramAndRestart_length)
if changed timestamp then exec "cleanProgramAndRestart"
as uid username and gid groupname