我正在尝试通过以下方式运行nvidia docker映像-
$ nvidia-docker run --rm --name=kitty1 -ti nvcr.io/nvidia/tensorflow:19.01-py3
但是它给出了错误-
/usr/bin/docker-current: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/create?name=kitty1: EOF. See '/usr/bin/docker-current run --help'.
我正在使用远程主机btw并通过ssh-ing连接。该图像来自https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow,该图像是我使用docker pull获得的。
docker版本是-
Docker version 1.13.1, build 07f3374/1.13.1
NVIDIA驱动程序也已正确安装在主机系统上。我尝试运行来检查
$nvidia-smi
其中列出了所有GPU及其详细信息。
我尝试使用常规docker run(不带前缀“ nvidia-”),但确实可以运行,但是无法加载非常需要的gpu驱动器支持。
$docker run -it --rm nvcr.io/nvidia/tensorflow:19.01-py3
输出为
================
== TensorFlow ==
================
NVIDIA Release 19.01 (build 5238117)
TensorFlow Version 1.12.0+
Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2018 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use 'nvidia-docker run' to start this container; see
https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TensorFlow. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
root@bed8972b5e93:/workspace#
上面的#是以root用户身份运行的bash提示符。如该消息所示,它无法加载gpu驱动程序。
前缀为'nvidia-'仍然会产生错误
我期望的是
容器运行,加载gpu驱动程序,并可能显示欢迎消息,然后提示准备输入命令。
会发生什么
我说一个错误-
/usr/bin/docker-current: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/create?name=kitty1: EOF. See '/usr/bin/docker-current run --help'.
这已经好几个小时了,但我仍然没有丝毫的线索来弄清楚是什么原因造成的。