如何使用GTX1080为Tensorflow设置CUDA参数?

时间:2016-07-06 06:45:10

标签: cuda tensorflow

安装GTX1080的diriver后,tensorflow显示它可以找到cudnn库。

但是,modprobe无法识别GPU驱动程序。 Detais信息如下:

$ python
[14:22:14]
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.InteractiveSession()
modprobe: ERROR: could not insert 'nvidia_352_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.27  Thu Jun  9 18:53:27 PDT 2016 GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.27.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

GTX1080驱动程序的版本是367.27,由NVIDIA提供。

我不知道为什么有一个&nvidia_352_uvm'

nvidia-smi的结果就在这里。 可能是我需要重新安装cuda,但我确实重新安装了好几次。 我应该删除所有cuda库和nvidia驱动程序,然后重新安装它们吗?关于这两个是否有任何安装顺序?

enter image description here

1 个答案:

答案 0 :(得分:1)

评论太长了,但这里有一些我尝试让NVidia驱动程序与Ubuntu玩得很好的技巧。

在现有驱动程序之上升级新驱动程序会提供部分升级的安装。你需要先删除以前的东西。

sudo apt-get remove --purge nvidia-*
sudo rm /etc/X11/xorg.conf   # if you ran nvidia-xconfig

按如下方式重新加载NVidia驱动程序(从虚拟终端,CTRL + ALT + F7)

sudo service lightdm stop  # stop your window manager
killall python  # kill all running TensorFlow instances to free GPU
sudo modprobe -r nvidia
sudo modprobe nvidia
dmesg | tail -100 # check for error messages

检查日志以查找来自NVidia的任何错误消息

dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi     # make sure this reports version 367.27

此外,有两种方法可以安装驱动程序,使用Ubuntu内置升级sudo apt-get install nvidia-current,或者从NVidia网站获取tar ball。我无法让sudo apt-get路由为TensorFlow工作,所以我建议从NVidia网站下载驱动程序