安装GTX1080的diriver后,tensorflow显示它可以找到cudnn库。
但是,modprobe无法识别GPU驱动程序。 Detais信息如下:
$ python
[14:22:14]
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.InteractiveSession()
modprobe: ERROR: could not insert 'nvidia_352_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.27 Thu Jun 9 18:53:27 PDT 2016 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.27.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
GTX1080驱动程序的版本是367.27,由NVIDIA提供。
我不知道为什么有一个&nvidia_352_uvm'
nvidia-smi
的结果就在这里。
可能是我需要重新安装cuda
,但我确实重新安装了好几次。
我应该删除所有cuda库和nvidia驱动程序,然后重新安装它们吗?关于这两个是否有任何安装顺序?
答案 0 :(得分:1)
评论太长了,但这里有一些我尝试让NVidia驱动程序与Ubuntu玩得很好的技巧。
在现有驱动程序之上升级新驱动程序会提供部分升级的安装。你需要先删除以前的东西。
sudo apt-get remove --purge nvidia-*
sudo rm /etc/X11/xorg.conf # if you ran nvidia-xconfig
按如下方式重新加载NVidia驱动程序(从虚拟终端,CTRL + ALT + F7)
sudo service lightdm stop # stop your window manager
killall python # kill all running TensorFlow instances to free GPU
sudo modprobe -r nvidia
sudo modprobe nvidia
dmesg | tail -100 # check for error messages
检查日志以查找来自NVidia的任何错误消息
dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi # make sure this reports version 367.27
此外,有两种方法可以安装驱动程序,使用Ubuntu内置升级sudo apt-get install nvidia-current
,或者从NVidia网站获取tar ball。我无法让sudo apt-get
路由为TensorFlow工作,所以我建议从NVidia网站下载驱动程序