我有一个手动构建的Kubernets集群1.11.4,它使用作为AWS ec2实例,1个主节点和1个小兵运行的CentOS。集群非常稳定。我想将JupyterHub部署到集群中。文档 here和here调出了一些配置EFS的细节。我选择和EBS一起去。
pvc失败并显示:
Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
以下是StorageClass定义:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp2
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
fsType: ext4
PV Yaml:
kind: PersistentVolume
apiVersion: v1
metadata:
name: jupyterhub-pv
labels:
type: amazonEBS
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
awsElasticBlockStore:
volumeID: vol-0ddb700735db435c7
fsType: ext4
pvc yaml:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: jupyterhub-pvc
labels:
type: amazonEBS
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
$ kubectl -n jhub describe pvc hub-db-dir
返回:
Name: hub-db-dir
Namespace: jhub
StorageClass: standard <========from an earlier try
Status: Pending
Volume:
Labels: app=jupyterhub
chart=jupyterhub-0.8.2
component=hub
heritage=Tiller
release=jhub
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 110s (x106 over 3h43m) persistentvolume-controller Failed to provision volume with StorageClass "standard": Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
在我看来,这似乎是Pod尝试安装存储的尝试,但失败了。隔离此错误一直是一个挑战。我尝试修补pvc,将存储类更新为gp2
,现在将其标记为默认类,但在部署pvc策略时还没有。修补失败:
$ kubectl -n jhub patch pvc hub-db-dir -p '{"spec":{"StorageClass":"gp2"}}'
persistentvolumeclaim/hub-db-dir patched (no change)
$ kubectl -n jhub describe pvc hub-db-dir
Name: hub-db-dir
Namespace: jhub
StorageClass: standard <====== Not changed
Status: Pending
Volume:
Labels: app=jupyterhub
chart=jupyterhub-0.8.2
component=hub
heritage=Tiller
release=jhub
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 2m26s (x108 over 3h48m) persistentvolume-controller Failed to provision volume with StorageClass "standard": Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
JupyterHub部署由Helm / tiller管理,因此,进行任何更改时,我将使用以下内容来更新Pod:
$ helm upgrade jhub jupyterhub/jupyterhub --version=0.8.2 -f config.yaml
config.yaml文件中用于分配用户存储的相关部分是:
proxy:
secretToken: "<random value>"
singleuser:
cloudMetadata:
enabled: true
singleuser:
storage:
dynamic:
storageClass: gp2
singleuser:
storage:
extraVolumes:
- name: jupyterhub-pv
persistentVolumeClaim:
claimName: jupyterhub-pvc
extraVolumeMounts:
- name: jupyterhub-pv
mountPath: /home/shared
故障排除的一部分还着重于让集群知道其资源是由AWS提供的。为此,我在kubernets配置文件中:
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
行:
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=aws --cloud-config=/etc/kubernetes/cloud-config.conf
其中:/etc/kubernetes/cloud-config.conf
包含:
[Global]
KubernetesClusterTag=kubernetes
KubernetesClusterID=kubernetes
在文件kube-controller-manager.yaml
和kube-apiserver.yaml
中,添加了以下行:
- --cloud-provider=aws
我尚未标记任何AWS资源,但将基于this开始使用它。
下一步要进行故障排除的步骤是什么?
谢谢!
答案 0 :(得分:0)
也许this链接可以提供帮助吗?
您必须将
--cloud-provider=aws
标志添加到Kubelet中 在将节点添加到群集之前。 AWS集成的关键是 Node对象上的特定字段-.spec.providerID
字段-和 如果在以下情况下存在该标志,则只会填充该字段: 节点已添加到集群。如果将节点添加到集群,并且 然后再添加命令行标志,则此字段/值将不会 填充,集成将无法按预期进行。没有错误是 在这种情况下浮出水面(至少,不是我能够 找到)。如果您确实发现自己缺少
.spec.providerID
字段, 节点对象,您可以使用kubectl edit node命令添加它。的 该字段的值格式为aws:///<az-of-instance>/<instance-id>
。