我有一个设置,其中nagios从设备接收snmp陷阱。然后它通知config.cfg中定义的联系人。这很棒。我想要完成的是,如果问题在给定的时间内没有确认,那么nagios会发送另一个通知。我不能让nagios发送第二个通知。我正在使用外部命令来实际调用通知作为通知,一切正常。我没有看到nagios试图发出第二次通知。
我将所有配置文件剪切为1个配置文件,以便于阅读。
#TIMEPERIODS
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
#SERVICES
##handle the trap
define service{
host_name serverName
service_description TRAP
is_volatile 1
check_command check-host-alive
max_check_attempts 3
normal_check_interval 1
retry_check_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
notification_options w,u,c
notifications_enabled 1
contact_groups admins
}
#COMMANDS
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
define command{
command_name notify-host-by-sip
command_line /usr/lib64/nagios/plugins/calls/makeCall "$NOTIFICATIONTYPE$"
}
define command{
command_name notify-service-by-sip
command_line /usr/lib64/nagios/plugins/calls/makeCall "$NOTIFICATIONTYPE$"
}
#CONTACT_GROUPS
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members user_sip
}
#CONTACTS
define contact{
contact_name user_sip
alias useralias
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w
host_notification_options d
service_notification_commands notify-service-by-sip
host_notification_commands notify-host-by-sip
email someNumber@someServer
}
#HOSTS
define host{
host_name localhost
alias Development
address serverIP
max_check_attempts 5
check_period 24x7
contact_groups admins
notification_period 24x7
}
define host{
host_name serverName
alias Development
address someIP
max_check_attempts 5
check_period 24x7
contact_groups admins
notification_period 24x7
}
被动检查的结果
[1386274600] PASSIVE SERVICE CHECK: localhost;TRAP;1;TRAP trap received
[1386274600] SERVICE ALERT: localhost;TRAP;WARNING;HARD;1;TRAP trap received
[1386274600] SERVICE NOTIFICATION: user_sip;localhost;TRAP;WARNING;notify-service-by-sip;TRAP trap received
然后没有任何事情......
答案 0 :(得分:3)
在查看Nagios来源后,我可以说:
只有在处理完检查结果(主动或被动)后,才会发生通知。如果您尝试将'notification_interval'设置为小于'check_interval'的值,它将在启动时发出警告。
如果将'is_volatile'设置为'1',则会忽略树中的所有'notification_interval'选项。这基本上意味着,每次检查失败时发送通知。但是在发送通知之前,检查仍然必须失败。
因此,如果被动检查不是主动投掷非OK结果,则不会获得持续的警报流。
解决此问题的方法是创建一个“事件处理程序”脚本:
这应该保持警报,直到有人确认。
答案 1 :(得分:1)
我已按照您的建议创建了一个脚本,但是当清除陷阱进入时会发生什么事情仍有一个休眠事件计时器,它会重新提交先前失败的陷阱。此外,nagios事件处理程序会在有和没有&
的情况下挂起睡眠计时器的事件#!/bin/sh
SERVICESTATE=$1
HOSTADDRESS=$2
SERVICEDESC=$3
SERVICEOUTPUT=$4
NOTIFICATIONTYPE=$5
if [ $NOTIFICATIONTYPE == 'ACKNOWLEDGEMENT' ]
then exit
fi
case "$SERVICESTATE" in
OK)
rm /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
exit
;;
WARNING)
touch /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
sleep 45
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
sh /usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 1 $SERVICEOUTPUT
fi
exit
;;
CRITCAL)
touch /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
sleep 45
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
sh /usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 2 "$SERVICEOUTPUT"
fi
exit
;;
UNKNOWN)
exit
;;
esac
答案 2 :(得分:0)
围绕abit更改它,让我的snmp陷阱处理程序执行一个触及文件的脚本,如果该文件存在,它会将检查结果重新发送到nagios。
我使用这些来延迟第一次通知,因此快速的问题和恢复不会在夜间唤醒我并重复通知。由于Nagios决定是否应该通知,如果您承认该问题,您将不再收到电子邮件。
原谅我糟糕的编码,我学会了解决这个问题......:Ppass2.sh
#!/bin/sh
# VER 2
HOSTADDRESS=$1
SERVICEDESC=$2
SERVICESTATE=$3
SERVICEOUTPUT=$4
case "$SERVICESTATE" in
0)
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
rm /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
/usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 0 "$4"
fi
exit
;;
1)
touch /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 1 "$4" &
/usr/local/libexec/nagios/passtron.sh $HOSTADDRESS $SERVICEDESC 1 "$4" &
fi
exit
;;
2)
touch /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 2 "$4" &
/usr/local/libexec/nagios/passtron.sh $HOSTADDRESS $SERVICEDESC 2 "$4" &
fi
exit
;;
UNKNOWN)
exit
;;
esac
passtron.sh
#!/bin/sh
# PASSTRON
HOSTADDRESS=$1
SERVICEDESC=$2
SERVICESTATE=$3
SERVICEOUTPUT=$4
case "$SERVICESTATE" in
1)
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
sleep 60
fi
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 1 "$4" &
fi
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/passtron.sh $HOSTADDRESS $SERVICEDESC 1 "$4" &
fi
exit
;;
2)
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
sleep 60
fi
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/submit_check_result $HOSTADDRESS $SERVICEDESC 2 "$4" &
fi
if [ -f /usr/local/libexec/nagios/pass/$HOSTADDRESS.$SERVICEDESC.bad ]
then
/usr/local/libexec/nagios/passtron.sh $HOSTADDRESS $SERVICEDESC 2 "$4" &
fi
exit
;;
esac
以及我的一项服务
的示例define service{
hostgroup_name trapdevices
use trap-template
service_description TEMP-ALARM
contact_groups oncall-tech
is_volatile 0
normal_check_interval 1
retry_check_interval 1
max_check_attempts 1
active_checks_enabled 0
notification_interval 300
first_notification_delay 5