我有kube状态指标,prometheus和alertmanager部署了VM。我将prometheus和alertmanager配置为在重新启动次数在一定时间内增加一定值时获取警报。一切正常。但是,大量不必要的数据将作为警报的一部分来临。基本上,我不希望在普罗米修斯身上看到的所有标签都成为戒备的一部分。
目前我正在接收的内容:
alertname = RestartsAlerts
container = kube-state-metrics
endpoint = http
exported_container = kube-scheduler
exported_namespace = kube-system
....
alertname = RestartsAlerts
container = kube-state-metrics
endpoint = http
exported_container = kube-scheduler
exported_namespace = kube-system
警报配置:
- name: Pod-Restarts
rules:
- alert: RestartsAlerts
expr: max_over_time(kube_pod_container_status_restarts_total[3m]) - min_over_time(kube_pod_container_status_restarts_total[3m]) > 1
labels:
severity: critical
annotations:
summary: "More than 1 restart in pod {{ $labels.exported_pod }}"
description: "{{ $labels.exported_container }} container has restarted {{ $value }} times.\n Instance: {{ $labels.instance }}"