我正在运行一个GKE群集,该群集具有试图访问共享卷的多个Pod。由于GC永久性磁盘不允许ReadWriteMany访问,因此我在群集中设置了NFS服务器(以类似this之类的许多示例的方式)来允许它。我在该群集上的不同命名空间中同时运行生产和开发环境,但是由于这两个环境都运行相同的应用程序,因此它们都需要自己的文件系统。
当前,解决方案是以相同的方式设置两台NFS服务器(一台用于prod,一台用于dev)。似乎当使用NFS服务器装载卷的Pod与NFS服务器本身在同一节点上时,它们无法装载(错误是“无法附加或装载卷[...]:超时等待对于条件”)。但是,这似乎仅在dev环境中发生,因为prod环境没有任何问题。当前,两个NFS服务器都已分配到同一节点,这可能也是导致该问题的原因,但我不确定。
我一直在尝试以这种方式拥有2个NFS服务器是否存在问题,或者尝试将Pod连接到在同一节点上运行的NFS服务器是否存在问题,但无济于事至今。有什么想法会导致问题吗?
登录nfs服务器容器(对于dev和prod相同):
nfs-dev-server Oct 30, 2020, 3:57:23 PM NFS started
nfs-dev-server Oct 30, 2020, 3:57:22 PM exportfs: / does not support NFS export
nfs-dev-server Oct 30, 2020, 3:57:22 PM Starting rpcbind
nfs-dev-server Oct 30, 2020, 3:57:22 PM rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused
nfs-dev-server Oct 30, 2020, 3:57:21 PM Serving /
nfs-dev-server Oct 30, 2020, 3:57:21 PM Serving /exports
答案 0 :(得分:0)
我通过遵循tutorial you have linked重现了您的问题,并遇到了相同的问题,其原因是在创建第二个PersistentVolume
这就是我在GKE-的2个单独的命名空间中部署2个NFS服务器的方式-
version: 1.17.12-gke.2502
创建2个磁盘
gcloud compute disks create --size=10GB --zone=us-east1-b gce-nfs-disk
gcloud compute disks create --size=10GB --zone=us-east1-b gce-nfs-disk2
在dev
名称空间中创建NFS部署和服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-server
namespace: dev
spec:
replicas: 1
selector:
matchLabels:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
volumes:
- name: mypvc
gcePersistentDisk:
pdName: gce-nfs-disk
fsType: ext4
---
apiVersion: v1
kind: Service
metadata:
name: nfs-server
namespace: dev
spec:
# clusterIP: 10.3.240.20
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
成功创建NFS部署后,检查服务kubectl get svc -n dev
的ClusterIP并将其添加到nfs: server: <CLUSTER_IP>
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
server: <CLUSTER_IP_OF_SVC>
path: "/"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs
namespace: dev
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-nginx
namespace: dev
spec:
replicas: 6
selector:
matchLabels:
name: nfs-nginx
template:
metadata:
labels:
name: nfs-nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: busybox
volumeMounts:
# name must match the volume name below
- name: nfs
mountPath: "/mnt"
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfs
检查并确认一切正常并运行:
kubectl get pods -n dev -w
NAME READY STATUS RESTARTS AGE
nfs-nginx-587f8bd757-4gm8z 1/1 Running 0 6s
nfs-nginx-587f8bd757-6lh4l 1/1 Running 0 6s
nfs-nginx-587f8bd757-czr4r 0/1 Running 0 6s
nfs-nginx-587f8bd757-m5vph 1/1 Running 0 6s
nfs-nginx-587f8bd757-wqcff 1/1 Running 0 6s
nfs-nginx-587f8bd757-xqnf9 1/1 Running 0 6s
nfs-server-5f58f8d764-gjjnf 1/1 Running 0 3m14s
对prod
名称空间重复相同的操作,但请记住在nfs-service
名称空间中更改prod
的IP地址,并在PV清单中进行相应的更改。部署此结果后:
$ kubectl get pods -n prod
NAME READY STATUS RESTARTS AGE
nfs-nginx2-5d75567b95-7n7gk 1/1 Running 0 6m25s
nfs-nginx2-5d75567b95-gkqww 1/1 Running 0 6m25s
nfs-nginx2-5d75567b95-gt96p 1/1 Running 0 6m25s
nfs-nginx2-5d75567b95-hf9j7 1/1 Running 0 6m25s
nfs-nginx2-5d75567b95-k2jdv 1/1 Running 0 6m25s
nfs-nginx2-5d75567b95-q457q 1/1 Running 0 6m25s
nfs-server2-8654b89f48-bp9lv 1/1 Running 0 7m19s