tensorflow对象检测更快rcnn随机失败

时间:2017-07-10 08:36:14

标签: python tensorflow object-detection

我正在尝试使用tensorflow 1.2中的新对象检测api和示例fast-rcnn config来训练自定义数据集。我得到的错误与一些张量形状有关,但它在训练过程中似乎随机发生,而且确切的形状也会发生变化。

INFO:tensorflow:global step 132: loss = 63.3741 (0.262 sec/step)
INFO:tensorflow:global step 133: loss = 33.7362 (0.292 sec/step)
INFO:tensorflow:global step 134: loss = 18.0165 (0.264 sec/step)
INFO:tensorflow:global step 135: loss = 40.5577 (0.266 sec/step)
INFO:tensorflow:global step 136: loss = 24.1086 (0.266 sec/step)
2017-07-10 10:23:49.066345: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
     [[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066475: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
     [[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066509: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
     [[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [1,60,4] vs. [1,64,4]
     [[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
     [[Node: gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1/_2621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13108_gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

正如您所看到的,它正确地运行了可变数量的步骤,然后给了我Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]。我不明白为什么会触发这个错误,以及不兼容的形状来自何处,因为这在运行之间也会发生变化。

当我将数据集转换为TF格式时,我不确定这是否是我的问题。但是,我已经成功地使用ssd实现在同一数据集上训练了几天,所以我认为可以正确地说数据格式正确。

编辑:标签地图文件为here。我想再次指出,同样的数据集使用ssd完美运行。

4 个答案:

答案 0 :(得分:1)

Tensorflow对象检测API假定'0'标签是为'none_of_the_above'保留的,因此立即要做的是在标签贴图中为每个标签索引添加1。

目前还不清楚为什么事情失败(以一种困难的方式)为更快的R-CNN而不是SSD(可能是我们要挖掘的东西)---但如果你得到非常好的结果我会有点惊讶SSD使用该标签贴图。

答案 1 :(得分:0)

You can try to start your class id from 1 instead of 0.

item {
  id: 1
  name: 'balloon'
}

It worked for me.

答案 2 :(得分:0)

您必须在 faster_rcnn_resnet101.config 文件中配置num_classes = xx

答案 3 :(得分:0)

您正在从tf.train.batchallow_smaller_final_batch=True阅读序列示例。错误可能是最后一批较小的最终批次,导致批次大小与形状不兼容