Tensorflow对象检测微调会导致错误的精度值

时间:2019-05-04 11:20:31

标签: tensorflow object-detection object-detection-api transfer-learning finetunning

我正在使用Tensorflow对象检测API,并希望在Kitti图像数据上使用经过预先训练的Faster R-CNN Resnet101模型,并在Cityscapes图像数据上进行微调。我下载了经过预训练的模型here

script创建tfrecord文件。我使用此脚本从Cityscape(CS)图像创建tfrecord文件。

随后使用CS tf_records来微调预训练的Resnet模型。对于此任务,我使用this

python3.5 model_main.py --pipeline_config_path={Path to config file in ../samples/configs/} --model_dir={Output directory} --num_train_steps={Train Steps} --sample_1_of_n_eval_examples=1 --alsologtostderr

仅使用CS培训和验证数据,COCO精度为-1.000

Average Precision (AP) @[ IoU=0.5:0.95 | area=all | maxDets=100 ] = -1.000
....

我尝试了不同的方法:

  1. 训练CS数据并验证Kitti数据。这导致的COCO精度不是-1.000,而是非常低的。在0.01和1.5%之间(经过10.000个训练步骤之后)

  2. 查看了Tensorboard可视化文件。在最初的1.500次迭代中,损失从0.05下降到0.01,而在最后的8.500次迭代中,损失在2.5e-4附近保持不变,并且变化不大。 (如果我会的话,我会上传图片。)

  3. 使用可操作的Kitti数据微调预训练的模型。我更改了创建Kitti tfrecord文件的tfrecord文件的内容。借此,我的意思是我删除了tfrecord数据中的所有无用变量(如3D注释等),以使其内容与我创建的CS tfrecords相似(见下面的代码)。使用这些经过处理的Kitti数据还可以导致验证准确性似乎是正常的(大约70-80%)。因此,我希望此错误不是由tfrecords中缺少属性引起的。

  4. 在预训练的Resnet模型上推断CS数据会导致大约20%的准确度,这就是我所期望的。 Kitti推理的准确性约为85%。

使用CS tfrecords每个图像包含以下内容:


    tf_example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(height),
    'image/width': dataset_util.int64_feature(width),
    'image/filename': dataset_util.bytes_feature(filename.encode('utf8')),
    'image/source_id': dataset_util.bytes_feature(filename.encode('utf8')),
    'image/encoded': dataset_util.bytes_feature(encoded_image_data),
    'image/format': dataset_util.bytes_feature(image_format.encode('utf8')),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

使用此代码对图像进行编码

    with tf.gfile.GFile(os.path.join(image_path, '{}'.format(currentImageName)), 'rb') as fid:
        encoded_image_data = fid.read()

    encoded_image_io = io.BytesIO(encoded_image_data)

数据编码可能是原因吗?或可能是错误的另一个来源?如前所述,我尝试了几种方法,但均未达到预期效果。微调应该不会那么难,否则我会错过任何一点吗?

如第4点所述,我测试了推理和tf_record文件,因此,我希望可以对模型进行微调。

通常,我希望10.000次迭代后精度不会接近0%。

一切看起来都有些奇怪,我不知道错误是什么。因此,对于此问题的每个提示/备注/解决方案,我将不胜感激。


编辑:


def create_tf_example(currentName, anno_path, image_path):

    currentNameSplit = currentName.split('.')[0]
    currentImageName = currentNameSplit + '.png'

    with tf.gfile.GFile(os.path.join(image_path, '{}'.format(currentImageName)), 'rb') as fid:
        encoded_image_data = fid.read()

    encoded_image_io = io.BytesIO(encoded_image_data)
    image = Image.open(encoded_image_io)
    image = np.asarray(image)

    width = int(image.shape[1])
    height = int(image.shape[0])

    filename = os.path.join(image_path, '{}'.format(currentImageName))
    image_format = 'png' # b'jpeg' or b'png'

    with open(anno_path + currentName) as file: 
        lines = file.readlines()

        xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
        xmaxs = [] # List of normalized right x coordinates in bounding box
         # (1 per box)
        ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
        ymaxs = [] # List of normalized bottom y coordinates in bounding box
         # (1 per box)
        classes_text = [] # List of string class name of bounding box (1 per box)
        classes = [] # List of integer class id of bounding box (1 per box)


        for li in range(len(lines)): 
            print('Lines[li]: {}'.format(lines[li]))
            xmins.append(float(lines[li].split()[0]) / width)
            xmaxs.append(float(lines[li].split()[2]) / width)
            ymins.append(float(lines[li].split()[1]) / height)
            ymaxs.append(float(lines[li].split()[3]) / height)
            classID = lines[li].split()[4]
            if int(classID) == 0:
                className = 'Car'
                classes_text.append(className.encode('utf8'))
                classID = 0
                classes.append(classID+1) # add 1 because class 0 is always reserved for 'background'
            elif int(classID) == 1:
                className = 'Person'                
                classes_text.append(className.encode('utf8'))
                classID = 1
                classes.append(classID+1)
            else:
                print('Error with Image Annotations in {}'. format(currentName))

        difficult_obj = [0] * len(xmins)


    tf_example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(height),
    'image/width': dataset_util.int64_feature(width),
    'image/filename': dataset_util.bytes_feature(filename.encode('utf8')),
    'image/source_id': dataset_util.bytes_feature(filename.encode('utf8')),
    'image/encoded': dataset_util.bytes_feature(encoded_image_data),
    'image/format': dataset_util.bytes_feature(image_format.encode('utf8')),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example



def main(_):

    writer_training = tf.python_io.TFRecordWriter(FLAGS.output_path_Training)
    writer_valid = tf.python_io.TFRecordWriter(FLAGS.output_path_Test)
    writer_test = tf.python_io.TFRecordWriter(FLAGS.output_path_Valid)

    allAnnotationFiles = []
    os.chdir(FLAGS.anno_path)
    for file in sorted(glob.glob("*.{}".format('txt'))):
        allAnnotationFiles.append(file)

    counter=0
    for currentName in allAnnotationFiles:
        if counter < 2411:
            tf_example = create_tf_example(currentName, FLAGS.anno_path, FLAGS.image_path)
            writer_training.write(tf_example.SerializeToString())
            counter += 1

        elif counter > 2411 and counter < 2972:
            tf_example = create_tf_example(currentName, FLAGS.anno_path, FLAGS.image_path)
            writer_valid.write(tf_example.SerializeToString())
            counter += 1

        elif counter <= 3475:
            tf_example = create_tf_example(currentName, FLAGS.anno_path, FLAGS.image_path)
            writer_test.write(tf_example.SerializeToString())
            counter += 1 


    writer_training.close()
    writer_test.close()
    writer_valid.close()


if __name__ == '__main__':
    tf.app.run()


0 个答案:

没有答案