我在集群(https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus)中设置了kube-prometheus。它包含一些默认警报,例如“ CoreDNSdown等”。如何创建我自己的警报?
任何人都可以提供示例示例来创建警报,该警报会将电子邮件发送到我的gmail帐户吗?
我遵循了这个Alert when docker container pod is in Error or CarshLoopBackOff kubernetes。但是我无法使其正常工作。
答案 0 :(得分:2)
要将警报发送到您的gmail帐户,您需要在一个名为alertmanager.yaml的文件中设置alertmanager配置:
cat <<EOF > alertmanager.yml
route:
group_by: [Alertname]
# Send all notifications to me.
receiver: email-me
receivers:
- name: email-me
email_configs:
- to: $GMAIL_ACCOUNT
from: $GMAIL_ACCOUNT
smarthost: smtp.gmail.com:587
auth_username: "$GMAIL_ACCOUNT"
auth_identity: "$GMAIL_ACCOUNT"
auth_password: "$GMAIL_AUTH_TOKEN"
EOF
现在,当您使用kube-prometheus时,您将拥有一个名为alertmanager-main
的秘密,它是alertmanager
的默认配置。您需要使用以下命令,使用新配置再次创建机密alertmanager-main
:
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
现在,您的Alertmanager设置为在收到来自Prometheus的警报时发送电子邮件。
现在,您需要设置一个警报,邮件将在该警报上发送。您可以设置DeadManSwitch警报,该警报在每种情况下都会触发,并用于检查警报管道
groups:
- name: meta
rules:
- alert: DeadMansSwitch
expr: vector(1)
labels:
severity: critical
annotations:
description: This is a DeadMansSwitch meant to ensure that the entire Alerting
pipeline is functional.
summary: Alerting DeadMansSwitch
此后,将触发DeadManSwitch
警报,并应将电子邮件发送到您的邮件中。
参考链接:
编辑:
deadmanswitch警报应进入您的普罗米修斯正在读取的配置映射中。我将在这里分享我的普罗米修斯的相关快照:
"spec": {
"alerting": {
"alertmanagers": [
{
"name": "alertmanager-main",
"namespace": "monitoring",
"port": "web"
}
]
},
"baseImage": "quay.io/prometheus/prometheus",
"replicas": 2,
"resources": {
"requests": {
"memory": "400Mi"
}
},
"ruleSelector": {
"matchLabels": {
"prometheus": "prafull",
"role": "alert-rules"
}
},
上面的配置是我的prometheus.json文件的名称,该文件具有要使用的alertmanager的名称,以及ruleSelector
,它将基于prometheus
和role
标签选择规则。所以我的规则配置映射如下:
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus-rules
namespace: monitoring
labels:
role: alert-rules
prometheus: prafull
data:
alert-rules.yaml: |+
groups:
- name: alerting_rules
rules:
- alert: LoadAverage15m
expr: node_load15 >= 0.50
labels:
severity: major
annotations:
summary: "Instance {{ $labels.instance }} - high load average"
description: "{{ $labels.instance }} (measured by {{ $labels.job }}) has high load average ({{ $value }}) over 15 minutes."
在上方的配置图中替换DeadManSwitch
。
答案 1 :(得分:0)
如果您使用的是kube-promehtheus,则默认情况下它具有alertmanager-main secret和prometheus类设置。
步骤1:您必须删除alertmanager-main秘密
kubectl delete secret alertmanager-main -n monitoring
第2步:正如Praful所解释的那样,用新的变化创造秘密
cat <<EOF > alertmanager.yaml
route:
group_by: [Alertname]
# Send all notifications to me.
receiver: email-me
receivers:
- name: email-me
email_configs:
- to: $GMAIL_ACCOUNT
from: $GMAIL_ACCOUNT
smarthost: smtp.gmail.com:587
auth_username: "$GMAIL_ACCOUNT"
auth_identity: "$GMAIL_ACCOUNT"
auth_password: "$GMAIL_AUTH_TOKEN"
EOF
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
第3步:您必须添加新的普罗米修斯规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: k8s
role: alert-rules
name: prometheus-podfail-rules
spec:
groups:
- name: ./podfail.rules
rules:
- alert: PodFailAlert
expr: sum(kube_pod_container_status_restarts_total{container="ffmpeggpu"}) BY (container) > 10
NB:角色应该是角色:在规则选择器prometheus类型中指定的alert-rules,要检查使用情况
kubectl get prometheus k8s -n monitoring -o yaml