GKE 无法将卷挂载到部署/Pod:等待条件超时

时间:2021-04-20 09:12:33

标签: go kubernetes google-cloud-platform google-kubernetes-engine

我们遇到了 GKE 卷使用问题。

从今晚开始,我们的部署无法再访问我们的主文档存储磁盘,日志看起来像这样:

...
    /go/src/github.com/def/abc/backend/formulare/formulare_generate_http.go:62 +0x55
    github.com/def/abc/backend/formulare.CreateDirsIfNeeded(0xc000b9b1d0, 0x2e, 0x0, 0x0)
    /usr/local/go/src/os/path.go:20 +0x39
    os.MkdirAll(0xc000b9b1d0, 0x25, 0xc0000001ff, 0x25, 0xc000e75b18)
    /usr/local/go/src/os/stat.go:13 +0x4d
    os.Stat(0xc000b9b1d0, 0x25, 0xc000b9b1d0, 0x0, 0xc000b9b1d0, 0x25)
    /usr/local/go/src/os/stat_unix.go:31 +0x77
    os.statNolog(0xc000b9b1d0, 0x25, 0xc000171ac8, 0x2, 0x2, 0xc000b9b1d0)
    /usr/local/go/src/os/file_posix.go:245
    os.ignoringEINTR(...)
    /usr/local/go/src/os/stat_unix.go:32
    os.statNolog.func1(...)
    /usr/local/go/src/syscall/syscall_linux_amd64.go:66
    syscall.Stat(...)
    /usr/local/go/src/syscall/zsyscall_linux_amd64.go:1440 +0xd2
    syscall.fstatat(0xffffffffffffff9c, 0xc000b9b1d0, 0x25, 0xc001a90378, 0x0, 0xc000171ac0, 0x4f064b)
    /usr/local/go/src/syscall/asm_linux_amd64.s:43 +0x5
    syscall.Syscall6(0x106, 0xffffffffffffff9c, 0xc000b9b200, 0xc001a90378, 0x0, 0x0, 0x0, 0xc000ba4400, 0x0, 0xc000171a08)
    goroutine 808214 [syscall, 534 minutes]:

在gke上重新创建pv/pvc和nfs服务器后,pv/pvc绑定成功,但是nfs服务甚至没有再启动,因为它无法绑定磁盘:

      Warning  FailedMount  95s (x7 over 15m)  kubelet          
  Unable to attach or mount volumes: unmounted volumes=[document-storage-claim default-token-sbxxl], unattached volumes=[document-storage-claim default-token-sbxxl]: timed out waiting for the condition

奇怪的是,默认的 google 服务帐户令牌卷也无法安装。

这可能是谷歌的问题吗?我需要更改我的 nfs-server 配置吗?

这是我的 k8s 定义:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: document-storage-claim
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  volumeName: document-storage
  resources:
    requests:
      storage: 250Gi

--- 

apiVersion: v1
kind: PersistentVolume
metadata:
  name: document-storage
  namespace: default
spec:
  storageClassName: standard
  capacity:
    storage: 250Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  gcePersistentDisk:
    pdName: document-storage-clone
    fsType: ext4

--- 

apiVersion: v1
kind: ReplicationController
metadata:
  name: document-storage-nfs-server
spec:
  replicas: 1
  selector:
    role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      securityContext:
        fsGroup: 1000
      containers:
        - name: nfs-server
          image: k8s.gcr.io/volume-nfs:0.8
          ports:
            - name: nfs
              containerPort: 2049
            - name: mountd
              containerPort: 20048
            - name: rpcbind
              containerPort: 111
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /exports
              name: document-storage-claim
      volumes:
        - name: document-storage-claim
          persistentVolumeClaim:
            claimName: document-storage-claim

1 个答案:

答案 0 :(得分:3)

Google 似乎在 2020-04-20 的晚上推出了 GKE 更新。不知何故,这次更新也影响了以前的版本(在我们的例子中是 1.18.16-gke.502)。

我们通过升级到 1.19.8-gke.1600 解决了这个问题。