TensorFlow崩溃,错误为CUDNN_STATUS_BAD_PARAM

时间:2018-02-18 09:25:34

标签: tensorflow keras cudnn

我正在使用带有TensorFlow 1.5后端的Keras 2.1.4对Inception ResNet v2进行微调。

我的训练在第二纪元结束前崩溃,并出现以下错误消息:

Epoch 1/50
8103/8103 [==============================] - 3197s 395ms/step - loss: 0.0519 - f1: 0.4272 - precision: 0.6371 - recall: 0.3239 - val_loss: 0.0363 - val_f1: 0.5000 - val_precision: 0.7314 - val_recall: 0.3807
Epoch 2/50
8102/8103 [============================>.] - ETA: 0s - loss: 0.0425 - f1: 0.4800 - precision: 0.6890 - recall: 0.36922018-02-18 00:21:16.677165:

F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
22018-02-18 00:21:16.677165: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
2018-02-18 00:21:16.677219: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
2018-02-18 00:21:16.677347: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 147 147  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

我之前可能与a preceding section

有关

然而,如果是同样的问题,我不明白为什么第一个时代成功完成,崩溃只发生在第二个时代的末尾。

0 个答案:

没有答案