无法通过Docker驱动程序在Minikube上使用GPU

时间:2020-06-05 09:29:56

标签: docker gpu minikube nvidia-docker

目标:

我正在尝试在使用默认Docker驱动程序的Minikube集群上使用Nvidia GPU功能。

问题:

我可以在默认的nvidia-docker上下文中使用docker,但是切换到minikube docker-env时出现以下错误:

$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled

环境:

  • Ubuntu 18.04
  • Minikube v1.10.0
  • Docker版本:
$ docker version
Client: Docker Engine - Community
 Version:           19.03.10
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        9424aeaee9
 Built:             Thu May 28 22:16:49 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.9
  Git commit:       6a30dfca03
  Built:            Wed Sep 11 22:45:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.3-14-g449e9269
  GitCommit:        449e926990f8539fd00844b26c07e2f1e306c760
 runc:
  Version:          1.0.0-rc10
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:
  • Nvidia容器运行时版本:
$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev

其他信息:

该集群是通过以下方式创建的:

minikube start --cpus 3 --memory 8G

当前已启用以下minikube插件:

$ minikube addons list
|-----------------------------|----------|--------------|
|         ADDON NAME          | PROFILE  |    STATUS    |
|-----------------------------|----------|--------------|
| dashboard                   | minikube | disabled     |
| default-storageclass        | minikube | enabled ✅    |
| efk                         | minikube | disabled     |
| freshpod                    | minikube | disabled     |
| gvisor                      | minikube | disabled     |
| helm-tiller                 | minikube | disabled     |
| ingress                     | minikube | disabled     |
| ingress-dns                 | minikube | disabled     |
| istio                       | minikube | disabled     |
| istio-provisioner           | minikube | disabled     |
| logviewer                   | minikube | disabled     |
| metallb                     | minikube | disabled     |
| metrics-server              | minikube | disabled     |
| nvidia-driver-installer     | minikube | enabled ✅    |
| nvidia-gpu-device-plugin    | minikube | enabled ✅    |
| registry                    | minikube | disabled     |
| registry-aliases            | minikube | disabled     |
| registry-creds              | minikube | disabled     |
| storage-provisioner         | minikube | enabled ✅    |
| storage-provisioner-gluster | minikube | disabled     |
|-----------------------------|----------|--------------|

这是minikube上下文之外的一个有效示例:

$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
Fri Jun  5 09:23:49 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   51C    P8     6W / 120W |   1293MiB /  6077MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

1 个答案:

答案 0 :(得分:1)

这是社区Wiki的答案。如有需要,请随时进行编辑和扩展。

Minikube的docker驱动程序未正式支持Nvidia GPU。这为您提供了两种可能的选择:

  1. 尝试使用NVIDIA Container ToolkitNVIDIA device plugin。这是一种解决方法,可能不是您的用例中的最佳解决方案。

  2. 使用KVM2 driverNone driver。这两个都得到正式支持和记录。

我希望这会有所帮助。