Question

我遵循此tutorial来训练可可数据集上的物体检测模型。本教程包含下载和使用coco dataset及其注释并将其转换为 TFRecord 的步骤。

我需要使用自己的自定义数据进行训练，我使用labelimg工具进行了注释，该工具生成了包含图像的（w，h，xmin，ymin，xmax，ymax）的xml文件。。

但是 coco数据集具有JSON 格式，带有用于创建 TFRecord 的图像分割字段。

细分是培训Resnet，Retinanet所必需的吗？

那么，谁能指导我一个从我的XML批注创建不带分段值的JSON批注的过程？

xml：

<annotation>
    <folder>frames</folder>
    <filename>83.jpg</filename>
    <path>/home/tdadmin/Downloads/large/f/frames/83.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>640</width>
        <height>480</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>246</xmin>
            <ymin>48</ymin>
            <xmax>350</xmax>
            <ymax>165</ymax>
        </bndbox>
    </object>
</annotation>

Answer 1

注释格式实际上并不重要。我以前从txt文件创建过tfrecord。要创建自定义tfrecord，您必须编写自己的create_custom_tf_record.py，就像this folder中显示的其他记录一样。

但是，由于您使用的是类似可可的注释，因此可以使用文件create_coco_tf_record.py。实现自己所需的重要内容是annotations_list。 annotations_list只是一个字典，因此您的目标是将xml文件解析为包含键值对的字典，然后将正确的值传递给feature_dict，然后从tf.train.Example。拥有feature_dict后，就可以轻松创建tfrecord。

因此，对于您的确切示例，请首先解析xml文件。

tf.train.Example created

然后从import xml.etree.ElementTree as ET tree = ET.parse('annotations.xml')构造annotaions_list，如下所示：

tree

然后，您可以从annotations_list = {} it = tree.iter() for key in it: annotations_list[str(key.tag)] = key.text来创建feature_dict

annotations_list

只需确保提交的feature_dict = { 'image/height': dataset_util.int64_feature(annotatios_list['height']), 'image/width': dataset_util.int64_feature(...), 'image/filename': dataset_util.bytes_feature(...), 'image/source_id': dataset_util.bytes_feature(...), 'image/key/sha256': dataset_util.bytes_feature(...), 'image/encoded': dataset_util.bytes_feature(...), 'image/format': dataset_util.bytes_feature(...), 'image/object/bbox/xmin': dataset_util.float_list_feature(...), 'image/object/bbox/xmax': dataset_util.float_list_feature(...), 'image/object/bbox/ymin': dataset_util.float_list_feature(...), 'image/object/bbox/ymax': dataset_util.float_list_feature(...), 'image/object/class/text': dataset_util.bytes_list_feature(....), 'image/object/is_crowd': dataset_util.int64_list_feature(...), 'image/object/area': dataset_util.float_list_feature(...), }与feature_dict和annotations_list中正确的字段相对应即可。

您可能会奇怪，为什么label_map中的这些字段确实是必需的，根据官方文档using your own dataset，以下字段是必需的，其他字段是可选的。

feature_dict

Answer 2

您现在正在做的事情类似于我之前完成的项目。因此，我为您提供一些建议。

当我训练我的Mask RCNN模型时，我使用了VGG图像注释器（you can easily find that on Google）。通过使用该工具，可以轻松创建json注释文件。然后将其插入您的训练中。

希望对您有所帮助。如果您仍有疑问，请随时对此发表评论。

罗文

如何准备我的图像和注释以进行视网膜网训练？

2 个答案: