我正在尝试使用tensorflow 1.2中的新对象检测api和示例fast-rcnn config来训练自定义数据集。我得到的错误与一些张量形状有关,但它在训练过程中似乎随机发生,而且确切的形状也会发生变化。
INFO:tensorflow:global step 132: loss = 63.3741 (0.262 sec/step)
INFO:tensorflow:global step 133: loss = 33.7362 (0.292 sec/step)
INFO:tensorflow:global step 134: loss = 18.0165 (0.264 sec/step)
INFO:tensorflow:global step 135: loss = 40.5577 (0.266 sec/step)
INFO:tensorflow:global step 136: loss = 24.1086 (0.266 sec/step)
2017-07-10 10:23:49.066345: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066475: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
2017-07-10 10:23:49.066509: W tensorflow/core/framework/op_kernel.cc:1165] Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [1,60,4] vs. [1,64,4]
[[Node: gradients/Loss/BoxClassifierLoss/Loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape, gradients/Loss/BoxClassifierLoss/Loss/sub_grad/Shape_1)]]
[[Node: gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1/_2621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13108_gradients/FirstStageFeatureExtractor/resnet_v1_50/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/convolution_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
正如您所看到的,它正确地运行了可变数量的步骤,然后给了我Invalid argument: Incompatible shapes: [1,60,4] vs. [1,64,4]
。我不明白为什么会触发这个错误,以及不兼容的形状来自何处,因为这在运行之间也会发生变化。
当我将数据集转换为TF格式时,我不确定这是否是我的问题。但是,我已经成功地使用ssd实现在同一数据集上训练了几天,所以我认为可以正确地说数据格式正确。
编辑:标签地图文件为here。我想再次指出,同样的数据集使用ssd完美运行。
答案 0 :(得分:1)
Tensorflow对象检测API假定'0'标签是为'none_of_the_above'保留的,因此立即要做的是在标签贴图中为每个标签索引添加1。
目前还不清楚为什么事情失败(以一种困难的方式)为更快的R-CNN而不是SSD(可能是我们要挖掘的东西)---但如果你得到非常好的结果我会有点惊讶SSD使用该标签贴图。
答案 1 :(得分:0)
You can try to start your class id from 1 instead of 0.
item {
id: 1
name: 'balloon'
}
It worked for me.
答案 2 :(得分:0)
您必须在 faster_rcnn_resnet101.config 文件中配置num_classes = xx
答案 3 :(得分:0)
您正在从tf.train.batch
到allow_smaller_final_batch=True
阅读序列示例。错误可能是最后一批较小的最终批次,导致批次大小与形状不兼容