使用tensorflow对象检测API微调mask_rcnn_inception_resnet_v2_atrous_coco模型时断言失败错误

时间:2018-04-24 19:47:28

标签: tensorflow object-detection object-detection-api

我试图使用tensorflow对象检测API来微调mask_rcnn_inception_resnet_v2_atrous_coco模型并使用它来训练MIO-TCD数据集。我将MIO-TCD数据集转换为TFRecord。

但是,我遇到了以下InvalidArgumentError:

INFO:tensorflow:Error reported to Coordinator: assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [5]
         [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_155, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_157, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/shape/_147)]]
         [[Node: FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read/_225 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2248_FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert', defined at:
  File "train.py", line 167, in <module>
    tf.app.run()
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run
    _sys.exit(main(argv))
  File "train.py", line 163, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\hedey\models\research\object_detection\trainer.py", line 246, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "C:\Users\hedey\models\research\deployment\model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "C:\Users\hedey\models\research\object_detection\trainer.py", line 181, in _create_losses
    losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
  File "C:\Users\hedey\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py", line 1580, in loss
    groundtruth_masks_list,
  File "C:\Users\hedey\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py", line 1813, in _loss_box_classifier
    groundtruth_boxlists, groundtruth_masks_list)
  File "C:\Users\hedey\models\research\object_detection\core\target_assigner.py", line 447, in batch_assign_targets
    anchors, gt_boxes, gt_class_targets, gt_weights)
  File "C:\Users\hedey\models\research\object_detection\core\target_assigner.py", line 151, in assign
    groundtruth_boxes.get())[:1])
  File "C:\Users\hedey\models\research\object_detection\utils\shape_utils.py", line 279, in assert_shape_equal
    return tf.assert_equal(shape_a, shape_b)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\check_ops.py", line 392, in assert_equal
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\tf_should_use.py", line 118, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 169, in Assert
    condition, data, summarize, name="Assert")
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 48, in _assert
    name=name)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3160, in create_op
    op_def=op_def)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [5]
         [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_155, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_157, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/shape/_147)]]
         [[Node: FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read/_225 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2248_FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_call
    return fn(*args)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1329, in _run_fn
    status, run_metadata)
  File "C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [5]
         [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_155, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_157, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/shape/_147)]]
         [[Node: FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read/_225 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2248_FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/moving_mean/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

我发现其他人在多个github问题上发布了同样的问题。以下是一个示例,我已经在那里评论过,并建议在stackoverflow上发布它: https://github.com/tensorflow/models/issues/3972#issuecomment-381535604 https://github.com/tensorflow/models/issues/3972#issuecomment-381535604

2 个答案:

答案 0 :(得分:0)

将MIO-TCD数据集转换为TFRecord时,应该像这样设置include_masks参数。

--include_masks=True

你可以试试。

答案 1 :(得分:0)

问题出在使用create_pet_tf_record.py程序创建的tfrecord文件中,您需要使用以下arg --faces_only设置为false来创建它,因为如果将其保留为True(默认值),则不会进行分段提供,而这就是您要训练的东西。

检查:https://github.com/tensorflow/models/issues/3972