Question

我正在使用SSD上的TensorFlow object detection API对Open Images Dataset对象检测器进行微调。我的训练数据包含不平衡的课程，例如

顶部（5K图像）
着装（5万张图片）
等...

我想将类别权重添加到分类损失中以提高性能。我怎么做？配置文件的以下部分似乎相关：

loss {
  classification_loss {
    weighted_sigmoid {
    }
  }
  localization_loss {
    weighted_smooth_l1 {
    }
  }
 ...
  classification_weight: 1.0
  localization_weight: 1.0
}

如何更改配置文件以增加每个类的分类损失权重？如果不通过配置文件，建议采取哪种方式？

Answer 1

API期望直接在注释文件中为每个对象（bbox）分配权重。由于这一要求，使用类权重的解决方案似乎是：

1）如果您有自定义数据集，则可以修改每个对象（bbox）的注释，以将权重字段包括为“对象/权重”。

2）如果您不想修改注释，则可以重新创建 tf_records 文件，以包括bbox的权重。

3）修改API的代码（对我来说似乎很棘手）

我决定参加＃2，所以我在此处放置了代码，以为具有两个类（权重为（1.0，0.1）的“ dress”），并给出一个 xml 批注的文件夹，如下所示：

import os
import io
import glob
import hashlib
import pandas as pd
import xml.etree.ElementTree as ET
import tensorflow as tf
import random
from PIL import Image
from object_detection.utils import dataset_util

# Define the class names and their weight
class_names = ['top', 'dress', ...]
class_weights = [1.0, 0.1, ...]

def create_example(xml_file):

        tree = ET.parse(xml_file)
        root = tree.getroot()
        image_name = root.find('filename').text
        image_path = root.find('path').text
        file_name = image_name.encode('utf8')
        size=root.find('size')
        width = int(size[0].text)
        height = int(size[1].text)
        xmin = []
        ymin = []
        xmax = []
        ymax = []
        classes = []
        classes_text = []
        truncated = []
        poses = []
        difficult_obj = []
        weights = [] # Important line

        for member in root.findall('object'):

           xmin.append(float(member[4][0].text) / width)
           ymin.append(float(member[4][1].text) / height)
           xmax.append(float(member[4][2].text) / width)
           ymax.append(float(member[4][3].text) / height)
           difficult_obj.append(0)

           class_name = member[0].text
           class_id = class_names.index(class_name)
           weights.append(class_weights[class_id])

           if class_name == 'top':
               classes_text.append('top'.encode('utf8'))
               classes.append(1)
           elif class_name == 'dress':
               classes_text.append('dress'.encode('utf8'))
               classes.append(2)
           else:
               print('E: class not recognized!')

           truncated.append(0)
           poses.append('Unspecified'.encode('utf8'))

        full_path = image_path 
        with tf.gfile.GFile(full_path, 'rb') as fid:
            encoded_jpg = fid.read()
        encoded_jpg_io = io.BytesIO(encoded_jpg)
        image = Image.open(encoded_jpg_io)
        if image.format != 'JPEG':
           raise ValueError('Image format not JPEG')
        key = hashlib.sha256(encoded_jpg).hexdigest()

        #create TFRecord Example
        example = tf.train.Example(features=tf.train.Features(feature={
            'image/height': dataset_util.int64_feature(height),
            'image/width': dataset_util.int64_feature(width),
            'image/filename': dataset_util.bytes_feature(file_name),
            'image/source_id': dataset_util.bytes_feature(file_name),
            'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
            'image/encoded': dataset_util.bytes_feature(encoded_jpg),
            'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
            'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
            'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
            'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
            'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
            'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
            'image/object/class/label': dataset_util.int64_list_feature(classes),
            'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
            'image/object/truncated': dataset_util.int64_list_feature(truncated),
            'image/object/view': dataset_util.bytes_list_feature(poses),
            'image/object/weight': dataset_util.float_list_feature(weights) # Important line
        })) 
        return example  

def main(_):

    weighted_tf_records_output = 'name_of_records_file.record' # output file
    annotations_path = '/path/to/annotations/folder/*.xml' # input annotations

    writer_train = tf.python_io.TFRecordWriter(weighted_tf_records_output)
    filename_list=tf.train.match_filenames_once(annotations_path)
    init = (tf.global_variables_initializer(), tf.local_variables_initializer())
    sess=tf.Session()
    sess.run(init)
    list = sess.run(filename_list)
    random.shuffle(list)  

    for xml_file in list:
      print('-> Processing {}'.format(xml_file))
      example = create_example(xml_file)
      writer_train.write(example.SerializeToString())

    writer_train.close()
    print('-> Successfully converted dataset to TFRecord.')


if __name__ == '__main__':
    tf.app.run()

如果您有其他类型的注释，则代码将非常相似，但不幸的是，此代码将无法工作。

Answer 2

对象检测API丢失的定义如下：https://github.com/tensorflow/models/blob/master/research/object_detection/core/losses.py

尤其是，已实现以下损失类别：

分类损失：

加权SigmoidClassification损失
SigmoidFocalClassificationLoss
WeightedSoftmaxClassificationLoss
WeightedSoftmaxClassificationAgainstLogitsLoss
BootstrappedSigmoidClassificationLoss

本地化损失：

WeightedL2LocalizationLoss
WeightedSmoothL1LocalizationLoss
WeightedIOULocalizationLoss

权重参数用于平衡锚点（优先级框），并且大小为[batch_size, num_anchors]，除了进行强制负挖矿外。另外，focal loss可以权衡分类良好的示例，而将重点放在困难的示例上。

与极少的肯定示例（带有对象类的边界框）相比，主要类别的不平衡是由于出现了更多的负面示例（没有感兴趣的对象的边界框）。这似乎就是为什么没有将正例中的类不平衡（即正类标签的分布不均）作为对象检测损失的一部分的原因。

TensorFlow对象检测API中用于平衡数据的类权重

2 个答案: