安装的Prometheus alertmanager在AWS上使用Helm失败

时间:2018-01-25 10:22:55

标签: amazon-web-services kubernetes storage prometheus kubernetes-helm

在AWS上安装了kops的k8s群集。

在k8s群集中部署了带有Helm的prometheus:

$ helm install stable/prometheus

它具有alertmanager配置以及一些清单文件:

https://github.com/kubernetes/charts/tree/master/stable/prometheus/templates

安装完成后,请检查pods:

$ kubectl get po
soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx        1/2       CrashLoopBackOff   5          5m
soft-flee-monitoring-kube-state-metrics-ff9b86484-lwdvm   1/1       Running            0          5m
soft-flee-monitoring-node-exporter-ckd2r                  1/1       Running            0          5m
soft-flee-monitoring-node-exporter-rwclt                  0/1       Pending            0          1s
soft-flee-monitoring-pushgateway-99986f-4thpx             1/1       Running            0          5m
soft-flee-monitoring-server-558b4895c8-f56hg              0/2       Pending            0          5m

查看失败原因:

$ kubectl describe po soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Name:           soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Namespace:      default
Node:           ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time:     Thu, 25 Jan 2018 09:39:34 +0000
Labels:         app=monitoring
                component=alertmanager
                pod-template-hash=1912934358
                release=soft-flee
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"soft-flee-monitoring-alertmanager-5f56f7879d","uid":"a4e136ae-01...
                kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container monitoring-alertmanager; cpu request for container monitoring-alertmanager-configmap-reload
Status:         Running
IP:             100.96.6.83
Created By:     ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Controlled By:  ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Containers:
  monitoring-alertmanager:
    Container ID:  docker://700dc92be231da0a5059e4645ba03a5cac762e8e41d3dc04b9be17a10ebfdcbb
    Image:         prom/alertmanager:v0.9.1
    Image ID:      docker-pullable://prom/alertmanager@sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
    Port:          9093/TCP
    Args:
      --config.file=/etc/config/alertmanager.yml
      --storage.path=/data
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 25 Jan 2018 09:40:19 +0000                                                                                                                                                                                                                                   Finished:     Thu, 25 Jan 2018 09:40:19 +0000
    Ready:          False
    Restart Count:  2
    Requests:
      cpu:        100m
    Readiness:    http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
  monitoring-alertmanager-configmap-reload:
    Container ID:  docker://0231fbc4dbe21d423d6bed858d70387cdfac60c2adb2d87a6a7087bf260ace74
    Image:         jimmidyson/configmap-reload:v0.1
    Image ID:      docker-pullable://jimmidyson/configmap-reload@sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
    Port:          <none>
    Args:
      --volume-dir=/etc/config
      --webhook-url=http://localhost:9093/-/reload
    State:          Running
      Started:      Thu, 25 Jan 2018 09:40:03 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /etc/config from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      soft-flee-monitoring-alertmanager
    Optional:  false
  storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  soft-flee-monitoring-alertmanager
    ReadOnly:   false
  default-token-wppzm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-wppzm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From                                                      Message
  ----     ------                 ----               ----                                                      -------
  Warning  FailedScheduling       1m (x3 over 1m)    default-scheduler                                         PersistentVolumeClaim is not bound: "soft-flee-monitoring-alertmanager" (repeated 5 times)
  Normal   Scheduled              1m                 default-scheduler                                         Successfully assigned soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx to ip-100.200.0.1.ap-northeast-1.compute.internal
  Normal   SuccessfulMountVolume  1m                 kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  MountVolume.SetUp succeeded for volume "default-token-wppzm"
  Warning  FailedMount            1m                 attachdetach                                              AttachVolume.Attach failed for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12" : Error attaching EBS volume "vol-0c8c9d3794bdbec90" to instance "i-0cf5ecba708a2ffe7": "IncorrectState: vol-0c8c9d3794bdbec90 is not 'available'.\n\tstatus code: 400, request id: ccda67b9-076f-4b95-93b8-86c4ca5f4229"
  Normal   SuccessfulMountVolume  1m                 kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  MountVolume.SetUp succeeded for volume "config-volume"
  Normal   SuccessfulMountVolume  56s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  MountVolume.SetUp succeeded for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12"
  Normal   Pulling                55s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  pulling image "prom/alertmanager:v0.9.1"
  Normal   Pulled                 50s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Successfully pulled image "prom/alertmanager:v0.9.1"
  Normal   Pulling                50s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  pulling image "jimmidyson/configmap-reload:v0.1"
  Normal   Created                44s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Created container
  Normal   Started                44s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Started container
  Normal   Pulled                 44s                kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Successfully pulled image "jimmidyson/configmap-reload:v0.1"
  Normal   Created                28s (x3 over 50s)  kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Created container
  Normal   Started                28s (x3 over 50s)  kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Started container
  Normal   Pulled                 28s (x2 over 44s)  kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Container image "prom/alertmanager:v0.9.1" already present on machine
  Warning  BackOff                12s (x4 over 43s)  kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Back-off restarting failed container
  Warning  FailedSync             12s (x4 over 43s)  kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal  Error syncing pod

出现FailedMount错误:

  

对卷“pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12”的附加卷安装失败:将EBS卷“vol-0c8c9d3794bdbec90”附加到实例

时出错

但是当我检查卷vol-0c8c9d3794bdbec90时,它正在运行。为什么会导致此错误?

1 个答案:

答案 0 :(得分:0)

如果您已使用KOPS设置群集,则会自动为您创建持久卷。你会得到上面的错误,但几分钟后它就会消失。

我创建的PV与值文件中的卷声明大小相匹配,这样当helm声明它们时,它们就会被使用。但实际上已经创造了2个新的PV并被声称。

这就是我创建卷的方式:

  

aws ec2 create-volume --availability-zone = us-east-1c --size = 2   --volume-type = gp2 --no-encrypted --tag-specifications“ResourceType = volume,Tags = [{Key = myproject,Value = prometheus-server}]”