ImportError:libcudnn.so.7:无法打开共享对象文件:没有这样的文件或目录

时间:2018-04-04 17:11:03

标签: python-3.x tensorflow ubuntu-16.04 cudnn

我在Python 3.6.4环境中安装了Tensorflow 1.6.0 - 带anaconda的GPU版本。

当我import tensorflow as tf时,我收到以下错误:

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

不同的版本:

  • cudnn : 7.1.1
  • cuda : 9.0.176
  • tensorflow : 1.6.0
  • Ubuntu : 16.04

我知道this,但它没有解决我的问题。

4 个答案:

答案 0 :(得分:4)

接受的答案是错误的(安装nvidia-cuda-toolkit)。通过安装工具包,您基本上是在nvidia指南中已安装的cuda之上安装第二个CUDA。

该问题原来是符号链接的问题。灵感来自这个主题http://queirozf.com/entries/installing-cuda-tk-and-tensorflow-on-a-clean-ubuntu-16-04-install 但实际分辨率不同

因此,CuDNN安装nvidia教程中的某个时刻会要求您这样做:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

此方法的问题在于,使用过滤器libcudnn*复制文件将破坏复制文件的符号链接。相反,我建议运行以下命令,但是它仍然会断开链接:

sudo cp --preserve=links cuda/lib64/libcudnn* /usr/local/cuda/lib64

您可以通过运行ls -lha libcudnn*文件夹中的/usr/local/cuda/lib64来验证链接。如果您碰巧看不到这样的图片:

lrwxrwxrwx 1根根13 May 2 20:02 libcudnn.so-> libcudnn.so.7

lrwxrwxrwx 1根root 5月2日20:02 libcudnn.so.7-> libcudnn.so.7.6.5

-rwxr-xr-x 1根409M 5月2日20:02 libcudnn.so.7.6.5

-rw-r--r-- 1个根386M 5月2日20:02 libcudnn_static.a

然后,您刚刚发现了问题。实际的解决方案包括执行以下操作:

sudo rm /usr/local/cuda/lib64/libcudnn.so
sudo rm /usr/local/cuda/lib64/libcudnn.so.7
cd /usr/local/cuda/lib64/
sudo ln -s libcudnn.so.7.6.5 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so

删除旧的“链接”并创建新的链接。使用ls -lha libcudnn*再次验证链接。之后,以详细模式运行以下命令:

sudo ldconfig -v

检查日志。我不知道它到底是做什么的,但是事实证明,这很重要。另外,如果日志显示符号链接已损坏或沿这些行显示内容,则tensorflow将继续显示该主题中提到的错误。

奖金!,请确保您在以下行中附加了以下路径:nano ~/.bashrc

export PATH=/usr/local/cuda/bin:/opt/nvidia/nsight-compute/2019.4.0${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDADIR=/usr/local/cuda${CUDADIR:+:${CUDADIR}}
export CUDA_HOME=/usr/local/cuda

然后运行命令source ~/.bashrc

以上所有步骤均假设您未使用nvidia-cuda-toolkit,而是使用了nvidia cuda repo。

此外,在安装CUDA时,请确保您未针对10.2。在编写时,TF支持的版本最高为Cuda 10.1,因此以下是安装必要版本的正确方法:

sudo apt-cache policy cuda
sudo apt-get install cuda=10.1.243-1

验证依据:

nvcc --version
nvidia-smi

编辑:我发现在运行ldconfig命令后您应该避免看到的错误:

/usr/local/cuda-10.1/targets/x86_64-linux/lib:

...

libnppist.so.10-> libnppist.so.10.2.0.243

libcuinj64.so.10.1-> libcuinj64.so.10.1.243

> /sbin/ldconfig.real:/usr/local/cuda-10.1/targets/x86_64-linux/lib /libcudnn.so.7不是符号链接

libcudnn.so.7-> libcudnn.so.7.6.5

libnppc.so.10-> libnppc.so.10.2.0.243

libnppicom.so.10-> libnppicom.so.10.2.0.243

libnvgraph.so.10-> libnvgraph.so.10.1.243

/ usr / lib / x86_64-linux-gnu / libfakeroot:

...

如果看到它,则说明仍然配置有误。

答案 1 :(得分:1)

我安装了nvidia-cuda-toolkit包:

$ sudo apt install nvidia-cuda-toolkit

并且有效。

我没有在tensorflow网站上找到解决方案,也没有找到nvidia安装页面。我找到了通过命令行获取cuda版本的方法,我找到了运气:How to get the cuda version?

答案 2 :(得分:1)

我没有足够的声誉来评论亚历克斯的答案。但是现在在Ubuntu 20.04上,路径已更改!另外,现在--preserve=links时无需cp!所以我应该发布一个新答案:

conda create --name tfgpu10.1 python=3.8创建的环境中,为带有CUDA 10.1的TensorFlow 2.3.1安装cuDNN库7.6:

  1. 转到https://developer.nvidia.com/cuDNN
  2. 在“为CUDA 10.1下载cuDNN v7.6.5(2019年11月5日)”中下载“用于Linux的cuDNN库”
  3. 使用tar -xvzf cudnn-10.1-linux-x64-v7.6.5.32.tgz
  4. 提取
  5. “安装”文件:
    sudo cp cuda/include/cudnn.h /usr/lib/cuda/include/
    sudo cp cuda/lib64/libcudnn* /usr/lib/cuda/lib64/
    
  6. 设置权限:
    sudo chmod a+r /usr/lib/cuda/include/cudnn.h /usr/lib/cuda/lib64/libcudnn*
    

测试结果:

Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-12-02 03:58:41.089993: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> tf.config.list_physical_devices("GPU")
2020-12-02 03:58:48.538295: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-02 03:58:48.587523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.587838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s
2020-12-02 03:58:48.587860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-12-02 03:58:48.589111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-12-02 03:58:48.590284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-12-02 03:58:48.590488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-12-02 03:58:48.591785: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-12-02 03:58:48.592520: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-12-02 03:58:48.595129: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-12-02 03:58:48.595213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.595555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.595815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

答案 3 :(得分:-4)

这对我来说不起作用,在我的情况下,这是因为我安装了多个版本的Cuda,并且我的cudnn版本是旧版本而不是我试图使用的版本所以我安装了继nvidia的instructions之后的新版本的cudnn,并为我做了。