使用Nvidia GPU的VirtualEnv张量流:cuda-9.0-vs-cuda-8.0,cuDNN-7.0-vs-cuDNN-6.0

时间:2017-10-05 21:47:09

标签: tensorflow ubuntu-16.04 nvidia

今天我使用RTFM tensorflow dot org install linux安装了tensorflow, 我安装了VirtualEnv + Python3 + CPU并测试了tensorflow Hello World,它运行良好。

然后我继续使用nvidia路径(GPU GTX 970)来安装VirtualEnv + Python + GPU。 RTFM(docs dot nvidia dot com cuda cuda-installation-guide-linux index dot html),cuda-9.0,cuDDN 7,所有PATH都可以,.bashrc是最新的,printenv LD_LIBRARY_PATH ok。

我的GPU已经可以使用cuda脚本deviceQuery和bandwitdhTest了。 Nvidia核对清单中的所有安装后操作都已通过。

当我在VirtualEnv + Python3 + GPU中运行Hello World时,下面的代码就是我得到的(cliffnote:tensorflow想要使用/usr/local/cuda-9.0/lib64中的一些cudalibrary-8.0,这是一个9.0目录。我试图添加一个符号链接,因此cudalibrary-8.0指向9.0,但后来我得到了与另一个cudalibrary相同的消息...为所有cuda库做这个技巧不是我称之为修复;-))

alexandre@Martin-2:~/Documents/Ordinateur/VirtualEnv$ source tensorflow_py3_gpu/bin/activate
(tensorflow_py3_gpu) alexandre@Martin-2:~/Documents/Ordinateur/VirtualEnv$ python
Python 3.5.2 (default, Sep 14 2017, 22:51:06) [GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> # Python
... import tensorflow as tf
Traceback (most recent call last):
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory

上面的最后一行是关于cudalibrary-8.0,显然不在cudalibrary-9.0的列表中。以下是其余部分。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/alexandre/Documents/Ordinateur/VirtualEnv/tensorflow_py3_gpu/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https tensorflow dot org slash install slash install_sources hashtag common_installation_problems for some common reasons and solutions.  Include the entire stack trace above this error message when asking for help.
>>> hello = tf.constant('Hello, TensorFlow!')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'tf' is not defined
>>> sess = tf.Session()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'tf' is not defined
>>> print(sess.run(hello))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sess' is not defined
>>> quit()
(tensorflow_py3_gpu) alexandre@Martin-2:~/Documents/Ordinateur/VirtualEnv$ deactivate`

- 第二天更新

不太清楚修复:在/ usr / local / cuda / lib64 /中为每个编号错误的库创建一个链接,链接到正确的数字版本。

alexandre@Martin-2:/usr/local/cuda/lib64$ sudo ln -s libcurand.so.9.0 libcurand.so.8.0

我用五个cuda库(libcusolver,libcublas,libcudart,libcurand,libcufft)和cuDNN库libcudnn(版本6 - &gt;版本7)完成了这个。

你好世界! tensorflow工作...但如果有人能告诉我为什么tensorflow在我只安装cuda-9.0和cuDDN-7.0时调用cuda-8.0和cuDDN-6.0库,那么你非常欢迎。

[已解决...或近似]更新 我发现https://github.com/tensorflow/tensorflow/issues/12052几乎解释了这一切。

Cliffnote: tensorflow-1.3使用cuda-8.0和cuDNN-6.0(这就是为什么当我运行tensorflow时这些库是明确链接的)。我被nvidia网站欺骗了,这让我下载了cuda-9.0和cuDNN-7.0版本,这些版本不会在tensorflow-1.3中实现。

tensorflow-1.4适用于cuda-9.0和cuDNN-7.0版本。 tensorflow-1-4可能会在2017年10月的某个时间提供(或者很快,请查看上面的链接)。

1 个答案:

答案 0 :(得分:1)

你试过吗? sudo apt install cuda-8-0 ? 它应该从http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604

下载包

除了安装cudnn6(就像我安装cudnn7一样),它对我有用。