我正在使用带有TensorFlow 1.5后端的Keras 2.1.4对Inception ResNet v2进行微调。
我的训练在第二纪元结束前崩溃,并出现以下错误消息:
Epoch 1/50
8103/8103 [==============================] - 3197s 395ms/step - loss: 0.0519 - f1: 0.4272 - precision: 0.6371 - recall: 0.3239 - val_loss: 0.0363 - val_f1: 0.5000 - val_precision: 0.7314 - val_recall: 0.3807
Epoch 2/50
8102/8103 [============================>.] - ETA: 0s - loss: 0.0425 - f1: 0.4800 - precision: 0.6890 - recall: 0.36922018-02-18 00:21:16.677165:
F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
22018-02-18 00:21:16.677165: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
2018-02-18 00:21:16.677219: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 149 149 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
2018-02-18 00:21:16.677347: F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 32 spatial: 147 147 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)
我之前可能与a preceding section
有关然而,如果是同样的问题,我不明白为什么第一个时代成功完成,崩溃只发生在第二个时代的末尾。