Question

我正在尝试根据此实现https://github.com/allanzelener/YAD2K/自己实施和培训YOLO。我遇到的问题是我的预测张量中的宽度/高度值正在爆炸，并且我从未在预测对象和地面真实对象之间看到高于0的IOU。在第一个时期的前几个小批处理中，这一切都是错误的。损耗和我的大部分预测宽度/高度为nan。

我的图片大小为416x416，我使用的是5个锚点，并且有5个类。我将图像划分为13x13的预测张量的[batch_size, 13, 13, 5, 10]网格。每一批的基本事实是[batch_size, 13, 13, 5, 5]，而类概率没有热点。

下面是我的损失函数（基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L152），该函数将图像传递到我的模型中，然后调用predict_transform，以重塑张量并转换坐标。

def loss_custom(true_box_grid, x):
    # training=training is needed only if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
    y_ = model(x, training=training)
    # (batch, rows, cols, anchors, vals)
    center_coords, wh_coords, obj_scores, class_probs = DetectNet.predict_transform(y_)
    detector_mask = create_mask(true_box_grid)
    total_loss = 0    

    pred_wh_half = wh_coords / 2.
    # bottom left corner
    pred_mins = center_coords - pred_wh_half
    # top right corner
    pred_maxes = center_coords + pred_wh_half

    true_xy = true_box_grid[..., 0:2]
    true_wh = true_box_grid[..., 2:4]
    true_wh_half = true_wh / 2.
    true_mins = true_xy - true_wh_half
    true_maxes = true_xy + true_wh_half

    # max bottom left corner
    intersect_mins = tf.math.maximum(pred_mins, true_mins)
    # min top right corner
    intersect_maxes = tf.math.minimum(pred_maxes, true_maxes)    
    intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.)
    # product of difference between x max and x min, y max and y min
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]


    pred_areas = wh_coords[..., 0] * wh_coords[..., 1]
    true_areas = true_wh[..., 0] * true_wh[..., 1]

    union_areas = pred_areas + true_areas - intersect_areas
    iou_scores = intersect_areas / union_areas

    # Best IOUs for each location.
    iou_scores = tf.expand_dims(iou_scores, 4)
    best_ious = tf.keras.backend.max(iou_scores, axis=4)  # Best IOU scores.
    best_ious = tf.expand_dims(best_ious, 4)

    # A detector has found an object if IOU > thresh for some true box.
    object_detections = tf.keras.backend.cast(best_ious > 0.6, dtype=tf.float32)

    no_obj_weights = params.noobj_loss_weight * (1 - object_detections) * (1 - detector_mask[...,:1])
    no_obj_loss = no_obj_weights * tf.math.square(obj_scores)

    # could use weight here on obj loss
    obj_conf_loss = params.obj_loss_weight * detector_mask[...,:1] * tf.math.square(1 - obj_scores)
    conf_loss = no_obj_loss + obj_conf_loss

    matching_classes = tf.cast(true_box_grid[...,4], tf.int32)
    matching_classes = tf.one_hot(matching_classes, params.num_classes)
    class_loss = detector_mask[..., :1] * tf.math.square(matching_classes - class_probs)

    # keras_yolo does a sigmoid on center_coords here but they should already be between 0 and 1 from predict_transform
    pred_boxes = tf.concat([center_coords, wh_coords], axis=-1)

    matching_boxes = true_box_grid[..., :4]
    coord_loss = params.coord_loss_weight * detector_mask[..., :1] * tf.math.square(matching_boxes - pred_boxes)

    confidence_loss_sum = tf.keras.backend.sum(conf_loss)
    classification_loss_sum = tf.keras.backend.sum(class_loss)
    coordinates_loss_sum = tf.keras.backend.sum(coord_loss)

    # not sure why .5 is here, maybe to make sure numbers don't get too large
    total_loss = 0.5 * (confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)            

    return total_loss

下面是predict_transform（基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L66），其将预测张量重整为网格以与地面真实物体进行比较。对于中心坐标，对象分数和类概率，它采用S型或softmax。

对于宽度，高度坐标，它将对它们执行指数运算（使它们为正），然后将它们乘以锚。这似乎是他们开始爆炸的地方。

def predict_transform(predictions):
        predictions = tf.reshape(predictions, [-1, params.grid_height, params.grid_width, params.num_anchors, params.pred_vec_len])

        conv_dims = predictions.shape[1:3]
        conv_height_index = tf.keras.backend.arange(0, stop=conv_dims[0])
        conv_width_index = tf.keras.backend.arange(0, stop=conv_dims[1])
        conv_height_index = tf.tile(conv_height_index, [conv_dims[1]]) # (169,) tensor with 0-12 repeating
        conv_width_index = tf.tile(tf.expand_dims(conv_width_index, 0), [conv_dims[0], 1]) # (13, 13) tensor with x offset in each row
        conv_width_index = tf.keras.backend.flatten(tf.transpose(conv_width_index)) # (169,) tensor with 13 0's followed by 13 1's, etc (y offsets)
        conv_index = tf.transpose(tf.stack([conv_height_index, conv_width_index])) # (169, 2)
        conv_index = tf.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2]) # y offset, x offset
        conv_dims = tf.cast(tf.reshape(conv_dims, [1, 1, 1, 1, 2]), tf.float32) # grid_height x grid_width, max dims of anchors

        # makes the center coordinate between 0 and 1, each grid cell is normalized to 1 x 1
        center_coords = tf.math.sigmoid(predictions[...,:2])
        conv_index = tf.cast(conv_index, tf.float32)
        center_coords = (center_coords + conv_index) / conv_dims

        # makes the objectness score a probability between 0 and 1
        obj_scores = tf.math.sigmoid(predictions[...,4:5])

        anchors = DetectNet.get_anchors()
        anchors = tf.reshape(anchors, [1, 1, 1, params.num_anchors, 2])
        # exp to make width and height positive then multiply by anchor dims to resize box to anchor
        # should fit close to anchor, normalizing by conv_dims should make it between 0 and approx 1
        wh_coords = (tf.math.exp(predictions[...,2:4])*anchors) / conv_dims

        # apply sigmoid to class scores to make them probabilities
        class_probs = tf.keras.activations.softmax(predictions[..., 5 : 5 + params.num_classes])

        # (batch, rows, cols, anchors, vals)
        return center_coords, wh_coords, obj_scores, class_probs

我在基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L352创建基本事实数据时还有另一个疑问。在下面的box[0]和box[1]是中心坐标，i和j是网格像元坐标（介于0和13之间），而box[2]和{ {1}}是宽度和高度。

它们均已归一化为位于网格坐标（0到13）之内。它将对象与相应的最佳锚点放置在地面真相网格中。 box[3]和box[0] - j确保中心坐标在0到1之间。

但是我不了解box[1] - i，因为锚点也在网格坐标刻度上，并且商可能小于1，这将在np.log(box[2] / anchors[best_anchor][0])之后产生负数。在训练过程中，我经常在地面真实数据中看到负宽度和高度，并且不知道该怎么做。

log

这也是我的模型，由于我缺乏计算资源，该模型被淹没了。

if best_iou > 0:
            adjusted_box = np.array(
                [
                    box[0] - j, # center should be between 0 and 1, like prediction will be
                    box[1] - i,
                    np.log(box[2] / anchors[best_anchor][0]), # quotient might be less than one, not sure why log is used
                    np.log(box[3] / anchors[best_anchor][1]),
                    box[4] # class label
                ],
                dtype=np.float32
            )
            true_box_grid[i, j, best_anchor] = adjusted_box

我想知道如何防止预测的宽度和高度，从而防止损失爆炸。指数可以确保它们为正，这是有道理的。我也可以在它们上做一个S形，但我不想将它们限制在0到1之间。在YOLO论文中，他们提到他们对网络进行了预训练，因此在YOLO训练开始时已经初始化了层权重。这是正确初始化网络的问题吗？

Tensorflow YOLO对象检测损失爆炸

0 个答案: