由于所获得的结果,我不确定此规则是否可以通过这种方式进行。 请问有人看起来还不错,或者您有更好的建议吗? 我观察到7h间隔模式负载> 6,这有点奇怪。
加载
alert: instance_load
expr: node_load5{instance=~".*:9100"} > 6
for: 1m
labels:
severity: critical
annotations:
description: 'On {{ $labels.job }} system Load average is too High: {{ $value }}.'
monitor: ""
runbook: ""
summary: '{{ $labels.job }} System Load is too High.'
磁盘延迟读/写
alert: instance_disk_read_latency
expr: (rate(node_disk_read_time_seconds_total[5m]) / rate(node_disk_reads_completed_total[5m])) > 0.015
for: 5m
labels:
severity: warning
annotations:
description: 'High read latency observed for device: {{ $labels.device }} with a
value of: {{ humanizeDuration $value }} on node: {{ $labels.alias }}. The average
value of the Avg. Disk sec/Read performance counter should be under 10 milliseconds.
The maximum value of the Avg. Disk sec/Read performance counter should not exceed
50 milliseconds.'
monitor: ""
runbook: ""
severity: warning
summary: 'High read latency observed for device: {{ $labels.device }} on: {{ $labels.alias
}}.'
alert: instance_disk_write_latency
expr: (rate(node_disk_write_time_seconds_total[5m]) / rate(node_disk_writes_completed_total[5m])) > 0.015
for: 5m
labels:
severity: warning
annotations:
description: 'High write latency observed for device: {{ $labels.device }} with
a value of: {{ humanizeDuration $value }} on node: {{ $labels.alias }}.'
monitor: ""
runbook: ""
severity: warning
summary: 'High write latency observed for device: {{ $labels.device }} on: {{ $labels.alias
}}.'
磁盘IOP
alert: instance_disk_iops
expr: sum by(alias, env) (rate(node_disk_reads_completed_total[5m]) + rate(node_disk_writes_completed_total[5m])) > 500
for: 5m
labels:
severity: warning
annotations:
description: 'Server Disk IOPs {{ $labels.alias }} has a value of: {{ humanize
$value }} over > 500 I/O ops/sec (IOPs).'
monitor: ""
runbook: ""
summary: '{{ $labels.job }} Server Disk IOPs over 500 I/O ops/sec (IOPs).'
问候,