Nagios事件处理程序忽略检查间隔

时间:2019-04-18 15:12:36

标签: event-handling nagios

我最近创建了一个用于服务检查的事件处理程序,它将在3个不同的框上重新启动Tomcat。

检查设置为:

5张支票

2分钟检查一次,确定

否则5分钟检查一次

在事件处理程序脚本中,我有:

# What state is the iOS PN in?
case "$1" in
OK)
        # The service is ok, so don't do anything...
        ;;
WARNING)
        # Is this a "soft" or a "hard" state?
        case "$2" in
                SOFT)
                        case "$3" in
                                #Check number
                                2)
                                        echo "`date` Restarting Tomcat on Node 1 for iOS PN (2nd soft warning state)..." >> /tmp/iOSPN.log
                                ;;
                                3)
                                        echo "`date` Restarting Tomcat on Node 2 for iOS PN (3rd soft warning state)..." >> /tmp/iOSPN.log
                                ;;
                                4)
                                        echo "`date` Restarting Tomcat on Node 3 for iOS PN (4th soft warning state)..." >> /tmp/iOSPN.log
                                ;;
                        esac
                        ;;
                HARD)
                        # Do nothing let Nagios send alert
                        ;;
                esac
        ;;
CRITICAL)
        # In theory nothing should reach this point...
        ;;
esac
exit 0

因此,事件处理程序应在第二次警告检查之后在节点1上重新启动Tomcat,等待5分钟再重新检查,如果仍然存在问题,则重新启动节点2,然后等待5分钟,然后再次检查,然后重新启动节点3。仍然是一个问题。

但是,当我检查日志文件时,可以看到以下内容:

Thu Apr 18 15:09:13 2019 Restarting Tomcat on Node 1 for iOS PN (2nd soft warning state)...
Thu Apr 18 15:09:23 2019 Restarting Tomcat on Node 2 for iOS PN (3rd soft warning state)...
Thu Apr 18 15:09:33 2019 Restarting Tomcat on Node 3 for iOS PN (4th soft warning state)...

如您所见,它将在10秒而不是5分钟后重新启动每个框,我删除了实际调用Tomcat重新启动的行,因为这无法在短时间内完成。

我无法在Nagios日志中看到任何细节,详细说明了为什么它如此迅速地进行了下一次检查,因此将不胜感激。

其他:

这是服务定义:

define service{
        use                     5check-service
        host_name               ACTIVEMQ1
        contact_groups          tyrell-admins-non-critical
        service_description     ActiveMQ - iOS PushNotification Queue Pending Items
        event_handler           restartRemote_Tomcat!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
        check_command           check_activemq_queue_item2!http://activemq1:8161/admin/xml/queues.jsp!IosPushNotificationQueue!100!300
        }

define service{
        name                            5check-service      ; The 'name' of this service template
        active_checks_enabled           1                       ; Active service checks are enabled
        passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
        parallelize_check               1                       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1                       ; We should obsess over this service (if necessary)
        check_freshness                 0                       ; Default is to NOT check service 'freshness'
        notifications_enabled           1                       ; Service notifications are enabled
        event_handler_enabled           1                       ; Service event handler is enabled
        flap_detection_enabled          1                       ; Flap detection is enabled
        failure_prediction_enabled      1                       ; Failure prediction is enabled
        process_perf_data               1                       ; Process performance data
        retain_status_information       1                       ; Retain status information across program restarts
        retain_nonstatus_information    1                       ; Retain non-status information across program restarts
        is_volatile                     0                       ; The service is not volatile
        check_period                    24x7                    ; The service can be checked at any time of the day
        max_check_attempts              5                       ; Re-check the service up to 5 times in order to determine its final (hard) state
        normal_check_interval           2                       ; Check the service every 5 minutes under normal conditions
        retry_check_interval            5                       ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  support                 ; Notifications get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r                 ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           5                       ; Re-notify about service problems every 5 mins
        notification_period             24x7                    ; Notifications can be sent out at any time
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

0 个答案:

没有答案