Question

我在AWS上设置了kubernetes群集，我正在尝试使用cAdvisor + Prometheus + Alert管理器监控多个pod。如果容器/ pod在Error或CarshLoopBackOff状态下发生故障或者在其他任何状态下运行，那么我想要做的就是启动电子邮件警报（带服务/容器名称）。

Answer 1

普罗米修斯收集a wide range of metrics。例如，您可以使用指标// for loops to set a subscriber for each host list.forEach(item => { for (let i = 0; i < item.hosts.length; i++) { // creation of a new instance from the service let service = new StatusHostService(this.statusHostService); service.getHostStatus().subscribe(result => item.hosts[i].status = result); service.getHostUpdateStatus().subscribe(result => item.hosts[i].update = result); this.hostService.startDeployment(host.name); }; });来监控重启，这将反映您的问题。

它包含您可以在警报中使用的标签：

容器= kube_pod_container_status_restarts_total
命名空间= container-name
荚= pod-namespace

因此，您需要的一切是通过添加正确的SMTP设置，接收器和规则来配置您的pod-name config：

alertmanager.yaml

Answer 2

我正在使用这个：

    - alert: PodCrashLooping
  annotations:
    description: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
    summary: Pod is crash looping.
  expr: rate(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*"}[5m]) * 60 * 5 > 0
  for: 5m
  labels:
    severity: critical

当docker container pod处于Error或CarshLoopBackOff kubernetes时发出警报

2 个答案: