我正在尝试使用我的Prometheus服务器设置警报管理器。我可以看到以下警报已正确触发,并且在我的闲暇频道中有可用通知。它具有简单的表达式并触发任何导出器关闭
即上== 0
alerts:
groups:
- name: Exporter
rules:
- alert: exporter-down
expr: up == 0
for: 2m
labels:
severity: page
annotations:
Summary: "Exporter {{ $labels.job }} is down."
Description: "{{ $labels.job }} has been down for more than 2 minutes."
GrafanaDashboard: example.com
Prometheus: example.com
AlertManager: example.com
Impact: Unavailability of {{ $labels.job }} will impact our monitoring. We will not able to get Insight of {{ $labels.job }}
当我尝试为特定范围内的响应设置类似的警报规则时。
即expr aws_applicationelb_target_response_time_average> 0.1 AND aws_applicationelb_target_response_time_average <0.35
- name: LoadBalancerWarning
rules:
- alert: slowResponseWarning
expr: aws_applicationelb_target_response_time_average > 0.1 AND aws_applicationelb_target_response_time_average < 0.35
labels:
severity: warning
annotations:
Summary: "Load Balancer {{ $labels.load_balancer }} response is more than 0.1 seconds but less than 0.35 for {{ $labels.job }} "
Description: "It is Warning Sign. "
GrafanaDashboard: example.com
Prometheus: example.com
AlertManager: example.com
Impact: Slow Response Impact User Experience
不确定我是否缺少什么。如果有人可以向我提供相同的指示,那就太好了。
答案 0 :(得分:0)
我可以看到云监视延迟了5到10分钟。当我在此警报查询中添加偏移量5到10分钟时。它能够触发警报