我正在Kubernetes v1.13.1中尝试调度GPU,我遵循了https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin中的指南
但是当我运行时,gpu资源没有显示
kubectl get nodes -o yaml
,据this post说,我检查了Nvidia gpu设备插件。
我跑步:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
几次,结果是
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists
似乎我已经安装了NVIDIA设备插件?但是kubectl get pods --all-namespaces
的结果是
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-qdhvd 2/2 Running 0 65m
kube-system coredns-78d4cf999f-fk4wl 1/1 Running 0 68m
kube-system coredns-78d4cf999f-zgfvl 1/1 Running 0 68m
kube-system etcd-liuqin01 1/1 Running 0 67m
kube-system kube-apiserver-liuqin01 1/1 Running 0 67m
kube-system kube-controller-manager-liuqin01 1/1 Running 0 67m
kube-system kube-proxy-l8p9p 1/1 Running 0 68m
kube-system kube-scheduler-liuqin01 1/1 Running 0 67m
运行kubectl describe node
时,gpu不在可分配的资源中
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ----------- - ---------- --------------- ------------- ---
kube-system calico-node-qdhvd 250m (2%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system coredns-78d4cf999f-fk4wl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system coredns-78d4cf999f-zgfvl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system etcd-liuqin01 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-apiserver-liuqin01 250m (2%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-controller-manager-liuqin01 200m (1%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-proxy-l8p9p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-scheduler-liuqin01 100m (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system nvidia-device-plugin-daemonset-p78wz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (8%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
答案 0 :(得分:1)
如评论中的lianyouCat所述:
在安装nvidia-docker2之后,应将docker的默认运行时修改为github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes,以设置为nvidia docker。
在修改
/etc/docker/daemon.json
之后,您需要重新启动docker才能使配置生效。