Question

我正在尝试使用我的Prometheus服务器设置警报管理器。我可以看到以下警报已正确触发，并且在我的闲暇频道中有可用通知。它具有简单的表达式并触发任何导出器关闭

即上== 0

alerts:
    groups:
      - name: Exporter
        rules:
        - alert: exporter-down
          expr: up == 0
          for: 2m
          labels:
            severity: page
          annotations:
            Summary: "Exporter {{ $labels.job }} is down."
            Description: "{{ $labels.job }} has been down for more than 2 minutes."
            GrafanaDashboard: example.com
            Prometheus: example.com
            AlertManager: example.com
            Impact: Unavailability of {{ $labels.job }} will impact our monitoring. We will not able to get Insight of {{ $labels.job }}

当我尝试为特定范围内的响应设置类似的警报规则时。

即expr aws_applicationelb_target_response_time_average> 0.1 AND aws_applicationelb_target_response_time_average <0.35

- name: LoadBalancerWarning
        rules:
        - alert: slowResponseWarning
          expr: aws_applicationelb_target_response_time_average > 0.1 AND aws_applicationelb_target_response_time_average < 0.35
          labels:
            severity: warning
          annotations:
            Summary: "Load Balancer {{ $labels.load_balancer }} response is more than 0.1 seconds but less than 0.35 for {{ $labels.job }} "
            Description: "It is Warning Sign. "
            GrafanaDashboard: example.com
            Prometheus: example.com
            AlertManager: example.com
            Impact: Slow Response Impact User Experience

不确定我是否缺少什么。如果有人可以向我提供相同的指示，那就太好了。

Answer 1

我可以看到云监视延迟了5到10分钟。当我在此警报查询中添加偏移量5到10分钟时。它能够触发警报

在Prometheus警报中无法看到处于活动状态的警报

1 个答案: