我有一组针对给定指标(设备智能)的警报。它们的范围很广(设备无法应答,smartctl退出状态为非零,指标错误等)。
它们由一组表达式组成,每个表达式处理各自的情况。
某些情况导致很少节点触发其警报。我想减少冗长程度,只触发许多警报。
现在我的代码如下:
var raw_data = stream
|from()
.measurement('smart_device' )
var rw_errors = raw_data|from()
.where(lambda: "read_error_rate" != 0 OR "write_error_rate" !=0)
var smart_enabled = raw_data|from()
.where(lambda: "enabled" != 'Enabled')
var health_not_ok = raw_data|from()
.where(lambda: "health_ok" == FALSE)
var exit_status = raw_data|from()
.where(lambda: "exit_status" != 0)
rw_errors|alert()
.crit(lambda: "read_error_rate" > 0)
.id('read_error_rate')
.message('Read error rate for for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is non-zero ({{ index .Fields "read_error_rate" }})')
.topic('<< kapacitor_device_smart_topic >>')
rw_errors|alert()
.crit(lambda: "write_error_rate" > 0)
.id('write_error_rate')
.topic('<< kapacitor_device_smart_topic >>')
.message('Write error rate for for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is non-zero ({{ index .Fields "write_error_rate" }})')
health_not_ok|alert()
.crit(lambda: "health_ok" != FALSE)
.id('drive_health_status')
.message('/dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is failing!')
.topic('<< kapacitor_device_smart_topic >>')
exit_status|alert()
.crit(lambda: "exit_status" != 0)
.id('smartctl_exit_status')
.message('Smartctl return non-zero exit code ({{index .Fields "exit_status"}}) for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }}')
.topic('<< kapacitor_device_smart_topic >>')
smart_enabled|alert()
.crit(lambda: "enabled" != 'Enabled')
.id('no_smart_for_device')
.message('Unable gather SMART for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }}')
.topic('<< kapacitor_device_smart_topic >>')
(这种书写方式是由于收集插件的细节所致,在严重的情况下,它不填充某些字段)。
(在上面)我的脚本在某些情况下会导致显示冗长的消息(很少触发警报)。
如果我在脚本中的先前节点之一收到警报,tickscript(kapacitor)中是否有任何方法不处理其他节点?