有关DISK的Prometheus警报/规则(IOP,读取/写入延迟;负载)是否合适?

时间:2018-12-05 05:35:23

标签: prometheus-alertmanager prometheus-node-exporter

由于所获得的结果,我不确定此规则是否可以通过这种方式进行。 请问有人看起来还不错,或者您有更好的建议吗? 我观察到7h间隔模式负载> 6,这有点奇怪。

加载

alert: instance_load
expr: node_load5{instance=~".*:9100"} > 6
for: 1m
labels:
severity: critical
annotations:
description: 'On {{ $labels.job }} system Load average is too High: {{ $value }}.'
monitor: ""
runbook: ""
summary: '{{ $labels.job }} System Load is too High.'

磁盘延迟读/写

alert: instance_disk_read_latency
expr: (rate(node_disk_read_time_seconds_total[5m]) / rate(node_disk_reads_completed_total[5m])) > 0.015
for: 5m
labels:
severity: warning
annotations:
description: 'High read latency observed for device: {{ $labels.device }} with a
value of: {{ humanizeDuration $value }} on node: {{ $labels.alias }}. The average
value of the Avg. Disk sec/Read performance counter should be under 10 milliseconds.
The maximum value of the Avg. Disk sec/Read performance counter should not exceed
50 milliseconds.'
monitor: ""
runbook: ""
severity: warning
summary: 'High read latency observed for device: {{ $labels.device }} on: {{ $labels.alias
}}.'

alert: instance_disk_write_latency
expr: (rate(node_disk_write_time_seconds_total[5m]) / rate(node_disk_writes_completed_total[5m])) > 0.015
for: 5m
labels:
severity: warning
annotations:
description: 'High write latency observed for device: {{ $labels.device }} with
a value of: {{ humanizeDuration $value }} on node: {{ $labels.alias }}.'
monitor: ""
runbook: ""
severity: warning
summary: 'High write latency observed for device: {{ $labels.device }} on: {{ $labels.alias
}}.'

磁盘IOP

 alert: instance_disk_iops
    expr: sum by(alias, env) (rate(node_disk_reads_completed_total[5m]) + rate(node_disk_writes_completed_total[5m])) > 500
    for: 5m
    labels:
    severity: warning
    annotations:
    description: 'Server Disk IOPs {{ $labels.alias }} has a value of: {{ humanize
    $value }} over > 500 I/O ops/sec (IOPs).'
    monitor: ""
    runbook: ""
    summary: '{{ $labels.job }} Server Disk IOPs over 500 I/O ops/sec (IOPs).'

问候,

0 个答案:

没有答案