训练StyleGan时发生CUDNN_STATUS_NOT_INITIALIZED错误

时间:2020-01-26 21:07:57

标签: python tensorflow machine-learning cudnn

我从https://github.com/NVlabs/stylegan下载了stylegan代码,并希望通过我的数据集对其进行训练。我正在使用ubuntu机器(Ubuntu 18.04.3 LTS)和

 python train.py

出现错误,说:

 2020-01-26 23:30:27.115726: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
 2020-01-26 23:30:27.115811: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 430.50.0

这是我的cuda,cudnn和pip列表的输出:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

$nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
| 42%   37C    P8    14W / 170W |    529MiB /  5931MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1100      G   /usr/lib/xorg/Xorg                           245MiB |
|    0      1578      G   /usr/bin/gnome-shell                         149MiB |
|    0      2179      G   ...quest-channel-token=1359353350696709871   132MiB |
+-----------------------------------------------------------------------------+

$dpkg -l | grep -i cudnn
ii  libcudnn7 7.6.5.32-1+cuda10.2       amd64        cuDNN runtime libraries
ii  libcudnn7-dev 7.6.5.32-1+cuda10.2   amd64        cuDNN development libraries and headers

$pip list

absl-py (0.9.0)
astor (0.8.1)
bleach (1.5.0)
certifi (2019.11.28)
chardet (3.0.4)
gast (0.3.3)
google-pasta (0.1.8)
grpcio (1.26.0)
h5py (2.10.0)
html5lib (0.9999999)
idna (2.8)
Keras-Applications (1.0.8)
Keras-Preprocessing (1.1.0)
Markdown (3.1.1)
mock (3.0.5)
numpy (1.18.1)
opencv-python (4.1.0.25)
Pillow (6.1.0)
pip (9.0.1)
pkg-resources (0.0.0)
protobuf (3.11.2)
requests (2.22.0)
scipy (1.2.0)
setuptools (45.1.0)
six (1.14.0)
tensorboard (1.14.0)
tensorflow-estimator (1.14.0)
tensorflow-gpu (1.14.0)
termcolor (1.1.0)
tqdm (4.32.2)
urllib3 (1.25.7)
Werkzeug (0.16.0)
wheel (0.33.6)
wrapt (1.11.2)
absl-py (0.9.0)
astor (0.8.1)
bleach (1.5.0)
certifi (2019.11.28)
chardet (3.0.4)
gast (0.3.3)
google-pasta (0.1.8)
grpcio (1.26.0)
h5py (2.10.0)
html5lib (0.9999999)
idna (2.8)
Keras-Applications (1.0.8)
Keras-Preprocessing (1.1.0)
Markdown (3.1.1)
mock (3.0.5)
numpy (1.18.1)
opencv-python (4.1.0.25)
Pillow (6.1.0)
pip (9.0.1)
pkg-resources (0.0.0)
protobuf (3.11.2)
requests (2.22.0)
scipy (1.2.0)
setuptools (45.1.0)
six (1.14.0)
tensorboard (1.14.0)
tensorflow-estimator (1.14.0)
tensorflow-gpu (1.14.0)
termcolor (1.1.0)
tqdm (4.32.2)
urllib3 (1.25.7)
Werkzeug (0.16.0)
wheel (0.33.6)
wrapt (1.11.2)

有没有人知道这些工具的特定版本,可以用来运行stylegan?

1 个答案:

答案 0 :(得分:1)

即使在注释部分中也提供了解决方案(答案部分),以维护社区的利益。

首先需要删除所有cuDNN文件

rm -f /usr/include/cudnn.h 
rm -f /usr/lib/x86_64-linux-gnu/*libcudnn* 
rm -f /usr/local/cuda-/lib64/*libcudnn 

现在从here提取新的cuDNN

要下载cuDNN,请确保您已注册NVIDIA Developer Program

  1. 转到:NVIDIA cuDNN home page
  2. 点击下载。
  3. 完成简短调查,然后单击“提交”。
  4. 接受条款和条件。 cuDNN显示的可用下载版本列表。
  5. 选择要安装的cuDNN版本。显示可用资源列表。

请检查Tensorflow GPU here的经过测试的构建配置

enter image description here

将以下文件复制到CUDA Toolkit目录中,并更改文件权限

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

有关更多详细信息,请参阅cuDNN安装指南here

注意 :更新cuDNN后,如果TensorFlow抱怨,则相应地更新Tensorflow