Question

我正在使用 Ubuntu 20.04 。我将Tensorflow-2.2.0升级到Tensorflow-2.3.0。当版本为 2.2.0 时，tensorflow很好地利用了GPU。但是升级到版本 2.3.0 后，它无法检测到GPU。

我已经从stackoverflow中看到了这个Link。这是 cuDNN 版本的问题。但是我需要cuDNN版本。

me_sajied@Kunai:~$ apt list | grep cudnn

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcudnn7-dev/now 7.6.5.32-1+cuda10.1 amd64 [installed,local]
libcudnn7/now 7.6.5.32-1+cuda10.1 amd64 [installed,local]

我还拥有所有必需的软件及其版本。

Cuda

me_sajied@Kunai:~$ apt list | grep cuda-toolkit

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-toolkit-10-0/unknown 10.0.130-1 amd64
cuda-toolkit-10-1/unknown,now 10.1.243-1 amd64 [installed,automatic]
cuda-toolkit-10-2/unknown 10.2.89-1 amd64
cuda-toolkit-11-0/unknown,unknown 11.0.3-1 amd64
nvidia-cuda-toolkit-gcc/focal 10.1.243-3 amd64
nvidia-cuda-toolkit/focal 10.1.243-3 amd64

Python

me_sajied@Kunai:~$ python3 --version
Python 3.8.2

环境

LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64"

日志

me_sajied@Kunai:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-09-13 21:28:37.387327: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> 
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-09-13 21:28:48.806385: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-13 21:28:48.836251: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2699905000 Hz
2020-09-13 21:28:48.836637: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3fde5f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-13 21:28:48.836685: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-13 21:28:48.840030: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-13 21:28:48.882190: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-13 21:28:48.882582: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x408bd90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-13 21:28:48.882606: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce 930MX, Compute Capability 5.0
2020-09-13 21:28:48.882796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-13 21:28:48.883151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce 930MX computeCapability: 5.0
coreClock: 1.0195GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 14.92GiB/s
2020-09-13 21:28:48.883196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-13 21:28:48.883415: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64
2020-09-13 21:28:48.885196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-13 21:28:48.885544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-13 21:28:48.887160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-13 21:28:48.888134: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-13 21:28:48.891565: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-13 21:28:48.891603: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-13 21:28:48.891625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-13 21:28:48.891632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-13 21:28:48.891639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
False
>>>

Answer 1

在您的~/.bashrc中添加：

LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64

如果lib64文件夹的位置不同，则需要相应地进行调整。

作为旁注，如果要频繁在多个CUDA版本之间切换，还可以直接在终端中为特定命令设置环境变量，例如：

LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64 python myprogram_which_needs_10_1.py

然后，如果要切换到其他版本，只需在命令前修改路径即可。

Answer 2

2020-09-13 21：28：48.883415：W tensorflow / stream_executor / platform / default / dso_loader.cc：59]无法加载动态库'libcublas.so.10'; dlerror：libcublas.so.10：无法打开共享对象文件：没有这样的文件或目录；

就我而言，这是由于安装导致的
libcublas10的 CUDA 10.2 的libcublas-dev和apt upgrade。

有关此问题的我的解决方案如下。

我的环境基于NVIDIA的CUDA存储库。

$ sudo apt install --reinstall libcublas10=10.2.1.243-1 libcublas-dev=10.2.1.243-1

并防止出现可升级的候选对象。

$ sudo apt-mark hold libcublas10
$ sudo apt-mark hold libcublas-dev

Tensorflow-2.3.0无法检测到GPU

Cuda

Python

环境

日志

2 个答案: