(在问题模板建议的情况下向tensorflow提交问题之前发布此处)
我正在尝试使用python 3.6构建一个tensorflow docker镜像,我有以下Dockerfile
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
curl \
libfreetype6-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
rsync \
software-properties-common \
unzip \
libcupti-dev \
&& add-apt-repository -y ppa:jonathonf/python-3.6 \
&& apt-get update \
&& apt-get install -y python3.6 python3.6-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3.6 get-pip.py \
&& rm get-pip.py
RUN python3.6 -m pip install --no-cache-dir -U ipython pip setuptools
RUN python3.6 -m pip install --no-cache-dir tensorflow
RUN ln -s /usr/bin/python3.6 /usr/bin/python
ENV LD_LIBRARY_PATH /usr/local/cuda-8.0/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV CUDA_HOME /usr/local/cuda-8.0
CMD ["ipython"]
我构建图像并运行一个强制gpu:0
:
nvidia-docker build -t tensorflow .
... (builds successfully)
nvidia-docker run --rm -v $PWD/test.py:/test.py tensorflow python /test.py
...
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'b': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2] values: [1 2][3]...>, _device="/device:GPU:0"]()]]
我尝试了与官方gpu图片tensorflow/tensorflow:latest-gpu
相同的脚本,它运行正常。所以nvidia-docker
和GPU本身肯定适用于张量流。
使用我建立的图像nvidia cuda和cudnn似乎安装正确:
nvidia-docker run --rm tensorflow bash -c "nvidia-smi; nvcc --version; cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2"
Sun Jul 23 22:50:11 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A |
| 21% 35C P8 1W / 38W | 795MiB / 976MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
我做错了什么?
(test.py
只是):
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
(我尝试使用nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
使用的基本图片tensorflow/tensorflow:latest-gpu
,但无效)
答案 0 :(得分:0)
事实证明,这就像安装tensorflow-gpu
而不是tensorflow
奇怪一样简单,张量流文档并没有解释这一点,但基本上这就是我愚蠢。