在Kubernetes v1.13.1中调度GPU

时间:2018-12-22 09:00:53

标签: kubernetes

我正在Kubernetes v1.13.1中尝试调度GPU,我遵循了https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin中的指南

但是当我运行时,gpu资源没有显示 kubectl get nodes -o yaml,据this post说,我检查了Nvidia gpu设备插件。

我跑步:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

几次,结果是

Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists

似乎我已经安​​装了NVIDIA设备插件?但是kubectl get pods --all-namespaces的结果是

NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   calico-node-qdhvd                  2/2     Running   0          65m
kube-system   coredns-78d4cf999f-fk4wl           1/1     Running   0          68m
kube-system   coredns-78d4cf999f-zgfvl           1/1     Running   0          68m
kube-system   etcd-liuqin01                      1/1     Running   0          67m
kube-system   kube-apiserver-liuqin01            1/1     Running   0          67m
kube-system   kube-controller-manager-liuqin01   1/1     Running   0          67m
kube-system   kube-proxy-l8p9p                   1/1     Running   0          68m
kube-system   kube-scheduler-liuqin01            1/1     Running   0          67m

运行kubectl describe node时,gpu不在可分配的资源中

Non-terminated Pods:         (9 in total)
Namespace                  Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
---------                  ----                                    ----------- -  ----------  ---------------  -------------  ---
kube-system                calico-node-qdhvd                       250m (2%)     0 (0%)      0 (0%)           0 (0%)         18h
kube-system                coredns-78d4cf999f-fk4wl                100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     19h
kube-system                coredns-78d4cf999f-zgfvl                100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     19h
kube-system                etcd-liuqin01                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-apiserver-liuqin01                 250m (2%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-controller-manager-liuqin01        200m (1%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-proxy-l8p9p                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-scheduler-liuqin01                 100m (0%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                nvidia-device-plugin-daemonset-p78wz    0 (0%)        0 (0%)      0 (0%)           0 (0%)         26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource           Requests    Limits
--------           --------    ------
cpu                1 (8%)      0 (0%)
memory             140Mi (0%)  340Mi (2%)
ephemeral-storage  0 (0%)      0 (0%)

1 个答案:

答案 0 :(得分:1)

如评论中的lianyouCat所述:

  

在安装nvidia-docker2之后,应将docker的默认运行时修改为github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes,以设置为nvidia docker。

     

在修改/etc/docker/daemon.json之后,您需要重新启动docker才能使配置生效。