在AWS上安装了kops
的k8s群集。
在k8s群集中部署了带有Helm
的prometheus:
$ helm install stable/prometheus
它具有alertmanager
配置以及一些清单文件:
https://github.com/kubernetes/charts/tree/master/stable/prometheus/templates
安装完成后,请检查pods:
$ kubectl get po
soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx 1/2 CrashLoopBackOff 5 5m
soft-flee-monitoring-kube-state-metrics-ff9b86484-lwdvm 1/1 Running 0 5m
soft-flee-monitoring-node-exporter-ckd2r 1/1 Running 0 5m
soft-flee-monitoring-node-exporter-rwclt 0/1 Pending 0 1s
soft-flee-monitoring-pushgateway-99986f-4thpx 1/1 Running 0 5m
soft-flee-monitoring-server-558b4895c8-f56hg 0/2 Pending 0 5m
查看失败原因:
$ kubectl describe po soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Name: soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Namespace: default
Node: ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time: Thu, 25 Jan 2018 09:39:34 +0000
Labels: app=monitoring
component=alertmanager
pod-template-hash=1912934358
release=soft-flee
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"soft-flee-monitoring-alertmanager-5f56f7879d","uid":"a4e136ae-01...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container monitoring-alertmanager; cpu request for container monitoring-alertmanager-configmap-reload
Status: Running
IP: 100.96.6.83
Created By: ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Controlled By: ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Containers:
monitoring-alertmanager:
Container ID: docker://700dc92be231da0a5059e4645ba03a5cac762e8e41d3dc04b9be17a10ebfdcbb
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager@sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 25 Jan 2018 09:40:19 +0000 Finished: Thu, 25 Jan 2018 09:40:19 +0000
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
monitoring-alertmanager-configmap-reload:
Container ID: docker://0231fbc4dbe21d423d6bed858d70387cdfac60c2adb2d87a6a7087bf260ace74
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload@sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Thu, 25 Jan 2018 09:40:03 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: soft-flee-monitoring-alertmanager
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: soft-flee-monitoring-alertmanager
ReadOnly: false
default-token-wppzm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wppzm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1m (x3 over 1m) default-scheduler PersistentVolumeClaim is not bound: "soft-flee-monitoring-alertmanager" (repeated 5 times)
Normal Scheduled 1m default-scheduler Successfully assigned soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx to ip-100.200.0.1.ap-northeast-1.compute.internal
Normal SuccessfulMountVolume 1m kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-wppzm"
Warning FailedMount 1m attachdetach AttachVolume.Attach failed for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12" : Error attaching EBS volume "vol-0c8c9d3794bdbec90" to instance "i-0cf5ecba708a2ffe7": "IncorrectState: vol-0c8c9d3794bdbec90 is not 'available'.\n\tstatus code: 400, request id: ccda67b9-076f-4b95-93b8-86c4ca5f4229"
Normal SuccessfulMountVolume 1m kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 56s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12"
Normal Pulling 55s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal pulling image "prom/alertmanager:v0.9.1"
Normal Pulled 50s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Successfully pulled image "prom/alertmanager:v0.9.1"
Normal Pulling 50s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal pulling image "jimmidyson/configmap-reload:v0.1"
Normal Created 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Successfully pulled image "jimmidyson/configmap-reload:v0.1"
Normal Created 28s (x3 over 50s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 28s (x3 over 50s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 28s (x2 over 44s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "prom/alertmanager:v0.9.1" already present on machine
Warning BackOff 12s (x4 over 43s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Back-off restarting failed container
Warning FailedSync 12s (x4 over 43s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Error syncing pod
出现FailedMount
错误:
对卷“pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12”的附加卷安装失败:将EBS卷“vol-0c8c9d3794bdbec90”附加到实例
时出错
但是当我检查卷vol-0c8c9d3794bdbec90
时,它正在运行。为什么会导致此错误?
答案 0 :(得分:0)
如果您已使用KOPS设置群集,则会自动为您创建持久卷。你会得到上面的错误,但几分钟后它就会消失。
我创建的PV与值文件中的卷声明大小相匹配,这样当helm声明它们时,它们就会被使用。但实际上已经创造了2个新的PV并被声称。
这就是我创建卷的方式:
aws ec2 create-volume --availability-zone = us-east-1c --size = 2 --volume-type = gp2 --no-encrypted --tag-specifications“ResourceType = volume,Tags = [{Key = myproject,Value = prometheus-server}]”