我正在尝试在prometheus中创建一个警报规则,以便当标签agentpool =“worker”在过去3分钟内的所有节点的平均CPU使用率低于30%时,它会触发警报。
现在,我可以使用
在过去3分钟内获得CPU使用率低于30%的任何节点的警报- alert: NodeCPUUtilizationLow
expr: instance:node_cpu:rate:sum * 100 < 30
labels:
severity: none
annotations:
description: CPU utilization has been lower than 30% for last 3 minutes (current value is {{$value}})
什么应该是expr,以便它只警告具有agentpool =“worker”标签的节点?我可以使用kube_node_labels{label_agentpool="worker"}
列出标签agentpool =“worker”的节点。
我如何为豆荚做同样的事情?我想要一个类似的警告:如果avg CPU使用率低于所有标签为app=web
我在kubernetes上运行prometheus。它是使用此处的helm图表安装的:https://github.com/coreos/prometheus-operator/tree/master/helm/
输出:
实例:node_cpu:rate:sum * 100&lt; 30
{instance="10.240.0.187:9100"} 26.466666666668715
kube_node_labels {label_agentpool = “工人”}
kube_node_labels{endpoint="kube-state-metrics",instance="10.240.0.187:8080",job="kube-prometheus-exporter-kube-state",label_agentpool="worker",label_beta_kubernetes_io_arch="amd64",label_beta_kubernetes_io_instance_type="Standard_B2s",label_beta_kubernetes_io_os="linux",label_failure_domain_beta_kubernetes_io_region="southcentralus",label_failure_domain_beta_kubernetes_io_zone="0",label_kubernetes_azure_com_cluster="dev-kube-cluster",label_kubernetes_io_hostname="k8s-worker-22588695-1",label_kubernetes_io_role="agent",namespace="monitoring",node="k8s-worker-22588695-1",pod="kube-prometheus-exporter-kube-state-854f846569-8lnk2",service="kube-prometheus-exporter-kube-state"} 1
kube_node_labels{endpoint="kube-state-metrics",instance="10.240.0.187:8080",job="kube-prometheus-exporter-kube-state",label_agentpool="worker",label_beta_kubernetes_io_arch="amd64",label_beta_kubernetes_io_instance_type="Standard_B2s",label_beta_kubernetes_io_os="linux",label_failure_domain_beta_kubernetes_io_region="southcentralus",label_failure_domain_beta_kubernetes_io_zone="1",label_kubernetes_azure_com_cluster="dev-kube-cluster",label_kubernetes_io_hostname="k8s-worker-22588695-0",label_kubernetes_io_role="agent",namespace="monitoring",node="k8s-worker-22588695-0",pod="kube-prometheus-exporter-kube-state-854f846569-8lnk2",service="kube-prometheus-exporter-kube-state"} 1
答案 0 :(得分:0)
instance:node_cpu:rate:sum
可能是https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
因此,您可以向Prometheus添加新的录制规则,以将工作节点的CPU相加或直接使用相应的查询。