Question

everybody!! I've build an tensorflow docker SSH enabled image. I'm running the image with following command:

nvidia-docker run -d -p 37001:22 --name tflow -v /media:/media tflow:0.1

Then I'm running tensorflow installation test as following:

docker exec -it tflow /bin/bash

# python
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session(config=config)
print(sess.run(hello))

Test passing with no errors:

> 2018-07-24 11:16:10.335883: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0
> with properties: name: GeForce GTX TITAN X major: 5 minor: 2
> memoryClockRate(GHz): 1.076 pciBusID: 0000:01:00.0 totalMemory:
> 11.92GiB freeMemory: 11.81GiB 2018-07-24 11:16:10.611365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1
> with properties: name: GeForce GTX TITAN X major: 5 minor: 2
> memoryClockRate(GHz): 1.076 pciBusID: 0000:02:00.0 totalMemory:
> 11.92GiB freeMemory: 11.81GiB 2018-07-24 11:16:10.886379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2
> with properties: name: GeForce GTX TITAN X major: 5 minor: 2
> memoryClockRate(GHz): 1.076 pciBusID: 0000:03:00.0 totalMemory:
> 11.92GiB freeMemory: 11.81GiB 2018-07-24 11:16:11.185562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3
> with properties: name: GeForce GTX TITAN X major: 5 minor: 2
> memoryClockRate(GHz): 1.076 pciBusID: 0000:04:00.0 totalMemory:
> 11.92GiB freeMemory: 11.80GiB 2018-07-24 11:16:11.186822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible
> gpu devices: 0, 1, 2, 3 2018-07-24 11:16:12.067601: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device
> interconnect StreamExecutor with strength 1 edge matrix: 2018-07-24
> 11:16:12.067624: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 2 3
> 2018-07-24 11:16:12.067629: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y Y Y
> 2018-07-24 11:16:12.067632: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N Y Y
> 2018-07-24 11:16:12.067635: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2:   Y Y N Y
> 2018-07-24 11:16:12.067638: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3:   Y Y Y N
> 2018-07-24 11:16:12.068539: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created
> TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with
> 11431 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN
> X, pci bus id: 0000:01:00.0, compute capability: 5.2) 2018-07-24
> 11:16:12.069290: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created
> TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with
> 11431 MB memory) -> physical GPU (device: 1, name: GeForce GTX TITAN
> X, pci bus id: 0000:02:00.0, compute capability: 5.2) 2018-07-24
> 11:16:12.069910: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created
> TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with
> 11431 MB memory) -> physical GPU (device: 2, name: GeForce GTX TITAN
> X, pci bus id: 0000:03:00.0, compute capability: 5.2) 2018-07-24
> 11:16:12.070523: I
> tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created
> TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with
> 11430 MB memory) -> physical GPU (device: 3, name: GeForce GTX TITAN
> X, pci bus id: 0000:04:00.0, compute capability: 5.2) 2018-07-24
> 11:16:12.072046: I tensorflow/core/common_runtime/process_util.cc:63]
> Creating new thread pool with default inter op setting: 2. Tune using
> inter_op_parallelism_threads for best performance. Hello, TensorFlow!

So far so good... But when I'm connected to the running container over ssh with root user, the same test failed with following message:

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory... Failed to load the native TensorFlow runtime.

I've searched on internet for solution and find following explanation here:

Using the sshd daemon to spawn shells makes it complicated to pass environment variables to the user’s shell via the normal Docker mechanisms, as sshd scrubs the environment before it starts the shell. If you’re setting values in the Dockerfile using ENV, you need to push them to a shell initialization file like the /etc/profile...

I do use ENV variables in my Dockerfile, so I've tried to push them to a shell initialization file /etc/profile:

RUN echo "" >> /etc/profile && \
    echo "export VISIBLE=now" >> /etc/profile && \
    echo "export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/lib" >> /etc/profile && \
    echo "export LIBRARY_PATH=/usr/local/lib" >> /etc/profile

But test over ssh connection failed again inspite of appropriate variables are in /etc/profile

I'm also tried to push those variables into the /root/.bashrc file during docker build, but test still failed.

So my question, it is possible to get this test work over root ssh connection?

Checking tensorflow installation on docker with ssh

0 个答案: