Docker容器中的nvidia-docker GPU

时间:2019-04-23 15:35:55

标签: docker tensorflow nvidia-docker

我正在尝试复制要求我遵循this particular tutorial on setting up Jupyter + Tensorflow + Nvidia GPU + Docker + Google Compute Engine的工作/实验。 '

我能够成功安装nvidia-docker。但是,在tutorialVerify the GPU is Visible from a Docker Container下,当我尝试运行

sudo nvidia-docker-plugin

我收到以下错误(请参阅最后一行):

nvidia-docker-plugin | 2019/04/23 15:17:47 Loading NVIDIA unified memory
nvidia-docker-plugin | 2019/04/23 15:17:47 Loading NVIDIA management library
nvidia-docker-plugin | 2019/04/23 15:17:47 Discovering GPU devices
nvidia-docker-plugin | 2019/04/23 15:17:47 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2019/04/23 15:17:47 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2019/04/23 15:17:47 Serving remote API at localhost:3476
nvidia-docker-plugin | 2019/04/23 15:17:47 Error: listen tcp 127.0.0.1:3476: bind: address already in use

当我跑步

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

我碰到以下executable file not found in $PATH": unknown错误:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
ERRO[0000] error waiting for container: context canceled 

我是docker的新手。因此,如果有人可以帮助我完成解决方案,那就太好了。我曾尝试搜索for answers,但是解决问题的实际过程使我不知所措。任何帮助将不胜感激。

编辑:我按照教程中的说明设置了GCE实例(即Ubuntu 16.04 LTS,50GB引导盘,1个GPU,以及jupyter和tensorboard)

1 个答案:

答案 0 :(得分:2)

要解决第一个问题,看起来nvidia-docker-plugin已经在运行。要找到此服务,请使用:

sudo netstat -tlpn | grep 3476

然后用:

杀死它
sudo pkill nvidia-docker

第二,安装nvidia-docker2并使用以下命令重新加载Docker守护进程配置:

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

链接以获取更多详细信息: