我正在开发一种管理nagios的自动化工具(nagiosql对我来说还不够)。
当我与nagios进行集成测试时,会发生一些奇怪的事情。
情况如下:
Nagios配置
# grep log_notification /usr/local/nagios/etc/nagios.cfg
log_notifications=1
#grep enable_notifications /usr/local/nagios/etc/nagios.cfg
enable_notifications=1
我创建了一个示例联系人,主持人,服务,所有这些都没有使用模板
define contact {
contact_name jone.doe
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period 7x24
service_notification_period 7x24
host_notification_options d,u,r,f,s,n
service_notification_options w,u,c,r,f,s,n
email john.doe@xx.com
host_notification_commands notify_service_by_state
service_notification_commands notify_service_by_state
}
define host{
host_name localhost
address localhost
max_check_attempts 1
check_period 7x24
active_checks_enabled 1
passive_checks_enabled 0
check_command check_host_alive
}
define service{
host_name localhost
service_description check_the_process_count_of_the_local_machine
check_command check_local_procs!10!20!RSZDT
max_check_attempts 3
check_interval 3
retry_interval 1
notification_interval 3
check_period 7x24
notification_period 7x24
notifications_enabled 1
contacts john.doe
}
以下是一些额外的配置文件:
define timeperiod{
timeperiod_name 7x24
alias 7x24
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
sunday 00:00-24:00
}
define command {
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}
define command {
command_name notify_service_by_state
command_line /usr/local/nagios/etc/scripts/notify.py "$SERVICESTATE$" "$CONTACTEMAIL$" "$CONTACTADDRESS1$" "$NOTIFICATIONTYPE$" "$HOSTALIAS$" "$SERVICEDESC$" "$HOSTADDRESS$" "$SERVICEOUTPUT$" "$LONGDATETIME$"
}
define command {
command_name check_host_alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
这是notify.py
源代码,安装了所有软件包。
#!/usr/bin/python
#coding:utf-8
import codecs
import sys
import urllib
import commands
import smtplib
from email.mime.text import MIMEText
from email.header import Header
import time
fi = codecs.open('/data/appdatas/logs/message.log', 'a' , encoding='utf-8')
fi.write("=======Begin Log\n");
fi.write("Notify\n");
fi.write(time.strftime('%Y-%m-%d %H:%M:%S') + "\n");
fi.write(sys.argv[1]+ "\n");
fi.write("End Log=========\n");
fi.close()
为了简化测试过程,我chmod +777
已log_file
。
chmod +x notify.py
。我已经测试过在终端中执行python脚本,并且日志文件中有日志。
正如您可以看到check_local_proc
命令。该命令将在执行时返回代码2.
在nagios.log
文件中,我看到了这些日志:
[1449588806] SERVICE ALERT: localhost;check_the_process_count_of_the_local_machine;CRITICAL;SOFT;1;PROCS CRITICAL: 423 processes with STATE = RSZDT
[1449588866] SERVICE ALERT: localhost;check_the_process_count_of_the_local_machine;CRITICAL;SOFT;2;PROCS CRITICAL: 423 processes with STATE = RSZDT
[1449588926] SERVICE ALERT: localhost;check_the_process_count_of_the_local_machine;CRITICAL;HARD;3;PROCS CRITICAL: 423 processes with STATE = RSZDT
从日志中可以看到,此服务每1分钟检查一次。
但是没有登录message.log文件(没有执行notify.py)。
答案 0 :(得分:0)
我知道为什么我不能发送任何通知。
因为我错误地配置了联系人的配置。
根据官方文件:Nagios Object Definitions,字段host_notification_options [d,u,r,f,s,n]
,service_notification_options [w,u,c,r,f,s,n]
;选项n
表示联系人不会收到任何通知。