在Tensorflow 2.0中运行CNN时,出现CUDNN_STATUS_INTERNAL_ERROR。 看来libcublas.so.10.0和libcudnn.so.7加载正常。
版本应该没问题:
我尝试了以下方法,但他们没有解决问题:
#include "driver_types.h"
修改为#include <driver_types.h>
,并通过mnistCUDNN测试(ref)问题:
毕竟,这是错误消息:
Using TensorFlow backend.
2019-10-16 14:48:16.226892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-16 14:48:16.255123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
...
2019-10-16 14:48:16.370703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3253 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5)
Train on 48000 samples, validate on 12000 samples
Epoch 1/12
2019-10-16 14:48:17.357747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-10-16 14:48:17.525865: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
--error here--
2019-10-16 14:48:17.873127: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-16 14:48:17.879412: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
--error here--
2019-10-16 14:48:17.879516: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
File "lenet.py", line 96, in <module> x_train, y_train, batch_size=128, epochs=12, validation_split=0.2
File "lenet.py", line 83, in train verbose=self.verbose
File "/home/yuyu/venv/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit validation_freq=validation_freq)
File "/home/yuyu/venv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop outs = fit_function(ins_batch)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3740, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d_1/convolution (defined at /home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_1220]
Function call stack:
keras_scratch_graph
答案 0 :(得分:0)
我在Ubuntu 20.04 / RTX 2070系统上遇到此错误。我发现了:
https://gist.github.com/mikaelhg/cae5b7938aa3dfdf3d06a40739f2f3f4#file-cuda-install-md
建议导出如下环境变量:
export TF_FORCE_GPU_ALLOW_GROWTH=true
那为我解决了。快乐的日子。