TensorFlow:为什么cuDNN无法启动? CUDA_ERROR_LAUNCH_FAILED

时间:2019-03-16 10:24:00

标签: python-3.x tensorflow keras

背景

自从3-4个月以来,我一直在使用TensorFlow + Keras进行深度学习。我通常使用python脚本并从shell调用它们。今天,我决定尝试使用jupyter笔记本从我的脚本中尝试一个深度神经网络。我使用Keras开始一个时代,但随后遇到以下错误

错误

Using TensorFlow backend.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
2019-03-16 15:19:11.414481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-03-16 15:19:11.971669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.35GiB
2019-03-16 15:19:11.981427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-16 15:19:26.748014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-16 15:19:26.752914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-03-16 15:19:26.757983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-03-16 15:19:26.792177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3058 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-03-16 15:19:29.466453: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-03-16 15:19:54.230708: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.236093: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.243638: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.250401: F tensorflow/stream_executor/cuda/cuda_dnn.cc:194] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.

尝试过的解决方案

正如GitHub上的多个类似问题所建议的那样,我尝试更新CUDA版本和TensorFlow版本以及图形驱动程序 目前我正在使用:

CUDA Version : 10.0.130
cuDNN Version : 7.3.1
TensorFlow GPU : 1.13.1
Keras GPU : 2.2.4
OS : Windows 10 64 bit Version 1809
GPU : NVIDIA GeForce 940MX
GPU Driver Version : 419.35 released 03-05-2019

我认为这一定是因为TensorFlow根本无法正确连接到GPU。因此,我从here开始尝试TensorFlow GPU测试,它似乎可以正常工作。(请查看图片here,因为我的信誉不足)

#1#2#3来看,这似乎是由于未经授权的内存访问而导致的错误。

我应该如何解决这个问题?我以为这与内存管理有关,因此重启可以解决此问题。但是我仍然遇到这个问题。

[P.S。 :诸如cuDNN和CUDA之类的所有软件库都是通过使用Python 3.6.6的Anaconda3安装的。]

0 个答案:

没有答案