我有一个仅Fargate的EKS集群。我真的不想自己管理实例。我想将Prometheus部署到它-这需要持久的卷。 As of two months ago this should be possible with EFS(受管理的NFS共享),我觉得我快到了,但是我无法弄清楚当前的问题是什么
我所做的:
到目前为止一切正常
我通过以下方式设置了持久性批量声明(据我所知必须是静态完成的):
userSlice
其中
use_backend app1 if app1_url
use_backend app2 if app2_url
和
kubectl apply -f pvc/
然后
tree pvc/
pvc/
├── two_pvc.yml
└── ten_pvc.yml
会发生什么?
prometheus alertmanager的pvc很好用。此部署的其他Pod也是如此,但Prometheus服务器使用以下命令进行崩溃循环回退
cat pvc/*
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv-two
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-ec0e1234
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv-ten
spec:
capacity:
storage: 8Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-ec0e1234
诊断
helm upgrade --install myrelease-helm-02 prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="efs-sc",server.persistentVolume.storageClass="efs-sc"
和
invalid capacity 0 on filesystem
kubectl get pv -A
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
efs-pv-ten 8Gi RWO Retain Bound prometheus/myrelease-helm-02-prometheus-server efs-sc 11m
efs-pv-two 2Gi RWO Retain Bound prometheus/myrelease-helm-02-prometheus-alertmanager efs-sc 11m
仅显示“错误”
最后,这个(来自同事):
kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus myrelease-helm-02-prometheus-alertmanager Bound efs-pv-two 2Gi RWO efs-sc 12m
prometheus myrelease-helm-02-prometheus-server Bound efs-pv-ten 8Gi RWO efs-sc 12m
除了出现权限问题外,我感到困惑-我知道存储可以工作并且可以访问-部署中的另一个Pod似乎对此感到满意-但不是这个。
答案 0 :(得分:2)
现在就工作-为了共同的利益在这里写下来。感谢/u/EmiiKhaos on reddit给出的建议
问题:
EFS共享仅为root:root
,Prometheus禁止以root用户身份运行pod。
解决方案:
方法:
创建2个EFS访问点,例如:
{
"Name": "prometheuserver",
"AccessPointId": "fsap-<hex01>",
"FileSystemId": "fs-ec0e1234",
"PosixUser": {
"Uid": 500,
"Gid": 500,
"SecondaryGids": [
2000
]
},
"RootDirectory": {
"Path": "/prometheuserver",
"CreationInfo": {
"OwnerUid": 500,
"OwnerGid": 500,
"Permissions": "0755"
}
}
},
{
"Name": "prometheusalertmanager",
"AccessPointId": "fsap-<hex02>",
"FileSystemId": "fs-ec0e1234",
"PosixUser": {
"Uid": 501,
"Gid": 501,
"SecondaryGids": [
2000
]
},
"RootDirectory": {
"Path": "/prometheusalertmanager",
"CreationInfo": {
"OwnerUid": 501,
"OwnerGid": 501,
"Permissions": "0755"
}
}
}
更新我的持久卷:
kubectl apply -f pvc/
类似:
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheusalertmanager
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-ec0e1234::fsap-<hex02>
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheusserver
spec:
capacity:
storage: 8Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-ec0e1234::fsap-<hex01>
像以前一样重新安装普罗米修斯:
helm upgrade --install myrelease-helm-02 prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="efs-sc",server.persistentVolume.storageClass="efs-sc"
进行有根据的猜测
kubectl describe pod myrelease-helm-02-prometheus-server -n prometheus
和
kubectl describe pod myrelease-helm-02-prometheus-alert-manager -n prometheus
设置安全上下文时需要指定哪个容器。然后应用安全性上下文来运行带有适当uid:gid
的Pod,例如与
kubectl apply -f setpermissions/
其中
cat setpermissions/*
给予
apiVersion: v1
kind: Pod
metadata:
name: myrelease-helm-02-prometheus-alertmanager
spec:
securityContext:
runAsUser: 501
runAsGroup: 501
fsGroup: 501
volumes:
- name: prometheusalertmanager
containers:
- name: prometheusalertmanager
image: jimmidyson/configmap-reload:v0.4.0
securityContext:
runAsUser: 501
allowPrivilegeEscalation: false
apiVersion: v1
kind: Pod
metadata:
name: myrelease-helm-02-prometheus-server
spec:
securityContext:
runAsUser: 500
runAsGroup: 500
fsGroup: 500
volumes:
- name: prometheusserver
containers:
- name: prometheusserver
image: jimmidyson/configmap-reload:v0.4.0
securityContext:
runAsUser: 500
allowPrivilegeEscalation: false