我有数百台主机向Prometheus服务器报告。每个主机有很多出口商。我希望能够列出不想从中发出警报的主机。我仍然需要在这些主机上进行Prometheus监视。
我尝试匹配没有接收者的路由。没用我究竟做错了什么?或者,我应该怎么做?
我的路线规则。我希望第一个匹配项匹配可忽略的实例并停止解析。我仍然收到警报。 :-(
route:
receiver: 'team-ops-mails'
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 2m
repeat_interval: 2h
routes:
- match_re:
instance: "int-pg-01:.*"
continue: false
- match:
nopage: true
receiver: team-mattermost
repeat_interval: 24h
- match:
severity: hwerror
receiver: hwerror-receiver
repeat_interval: 24h
- match:
role: worker
receiver: team-mattermost
- match:
role: ven-entrance
receiver: team-mattermost
答案 0 :(得分:1)
Alerting rules允许您基于Prometheus表达语言定义变更条件。
示例警报规则:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
解决问题的一种可能方法是在指标中添加一个额外的标签,例如enableAlert
。在定义警报规则时,您可以通过如下定义expr
来忽略某些主机的警报:
- name: example
rules:
- alert: DemoAlert
expr: <metric-name> {... ..., enableAlert = "true"} > ref_value
为您不希望触发警报的实例设置enableAlert = "false"
。