应用错误收集

一周前一切都还好。即使我在服务器上运行，我也认为并没有太大变化。不知道是什么原因造成的。 Tensorflow的版本为2.1.0-dev20191015

无论如何，这是GPU状态：

NVIDIA-SMI 430.50
驱动程序版本：430.50
CUDA版本：10.1

Epoch 1/5 2019-11-29 22:08:00.334979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2019-11-29 22:08:00.644569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-11-29 22:08:00.647191: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2019-11-29 22:08:00.647309: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 430.50.0 2019-11-29 22:08:00.647347: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation. 2019-11-29 22:08:00.647393: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Fail to find the dnn implementation.

最后，我得到：

UnknownError: [_Derived_] Fail to find the dnn implementation. [[{{node CudnnRNN}}]] [[sequential/bidirectional/forward_lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_18158] Function call stack: distributed_function -> distributed_function -> distributed_function

代码可追溯到此处：

174 history = model.fit(training_input, training_output, epochs=EPOCHES, 175 batch_size=BATCH_SIZE, --> 176 validation_split=0.1)

谢谢。

在先前的工作系统上遇到“无法创建cudnn句柄：CUDNN_STATUS_NOT_INITIALIZED”

1 个答案: