在Google Cloud上设置Tensorflow-gpu

时间:2019-02-26 20:48:13

标签: python tensorflow

我正在尝试在Google Cloud上设置tensorflow-gpu。

这是可用的GPU

arnoldwright1@gpu:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:04.0 ==
modalias : pci:v000010DEd000015F8sv000010DEsd0000118Fbc03sc02i00
vendor   : NVIDIA Corporation
model    : GP100GL [Tesla P100 PCIe 16GB]
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

我已经安装了nvidia驱动程序,此处使用nvidia-smi确认

arnoldwright1@gpu:~$ nvidia-smi
Tue Feb 26 12:44:54 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

检查CUDA版本为9.1

arnoldwright1@gpu:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

在这里,我按照this教程安装了cudnn驱动程序cudnn-9.1-linux-x64-v7.1.tgz,但使用的是更新版本的驱动程序。

在确认tensorflow-gpu是否正确设置的最后一步时,出现此错误。

In [1]: from tensorflow.python.client import device_lib
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-48501209a23d> in <module>()
----> 1 from tensorflow.python.client import device_lib
/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/__init__.py in <module>()
    22 
    23 # pylint: disable=g-bad-import-order
---> 24 from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
    25 
    26 try:
/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/python/__init__.py in <module>()
    47 import numpy as np
    48 
---> 49 from tensorflow.python import pywrap_tensorflow
    50 
    51 from tensorflow.python.tools import component_api_helper
/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py in <module>()
    72 for some common reasons and solutions.  Include the entire stack trace
    73 above this error message when asking for help.""" % traceback.format_exc()
---> 74   raise ImportError(msg)
    75 
    76 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long
ImportError: Traceback (most recent call last):
File "/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
File "/home/arnoldwright1/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

我已经检查了这个问题,这似乎是对cuda或cudnn驱动程序位置的错误配置,但我无法弄清源代码在哪里。

我还编辑了bash文件,使其在开始时包含以下几行。

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

0 个答案:

没有答案