对特定主机禁用警报,而对所有其他主机警报

时间:2019-08-22 21:07:07

标签: prometheus prometheus-alertmanager

我有数百台主机向Prometheus服务器报告。每个主机有很多出口商。我希望能够列出不想从中发出警报的主机。我仍然需要在这些主机上进行Prometheus监视。

我尝试匹配没有接收者的路由。没用我究竟做错了什么?或者,我应该怎么做?

我的路线规则。我希望第一个匹配项匹配可忽略的实例并停止解析。我仍然收到警报。 :-(

route:
  receiver: 'team-ops-mails'
  group_by: ['alertname', 'cluster']
  group_wait: 30s
  group_interval: 2m
  repeat_interval: 2h 
  routes:
  - match_re:
      instance: "int-pg-01:.*"
    continue: false
  - match:
      nopage: true
    receiver: team-mattermost
    repeat_interval: 24h
  - match:
      severity: hwerror
    receiver: hwerror-receiver
    repeat_interval: 24h
  - match:
      role: worker
    receiver: team-mattermost 
  - match:
      role: ven-entrance
    receiver: team-mattermost 

1 个答案:

答案 0 :(得分:1)

Alerting rules允许您基于Prometheus表达语言定义变更条件。

示例警报规则:

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

解决问题的一种可能方法是在指标中添加一个额外的标签,例如enableAlert。在定义警报规则时,您可以通过如下定义expr来忽略某些主机的警报:

- name: example
  rules:
  - alert: DemoAlert
    expr: <metric-name> {... ..., enableAlert = "true"} > ref_value

为您不希望触发警报的实例设置enableAlert = "false"