有人可以向我解释YOLO如何在对象周围绘制边界框吗?

时间:2019-02-04 09:47:03

标签: conv-neural-network object-detection yolo

在本文中,约瑟夫·雷德蒙(Joseph Redmon)只看一次:统一的实时对象检测,据说使用YOLO可以检测到对象及其类别概率。有人可以解释一下YOLO如何借助以下代码在对象周围绘制边界框以进行对象检测吗?

def custom_loss(y_true, y_pred):
mask_shape = tf.shape(y_true)[:4]

cell_x = tf.to_float(tf.reshape(tf.tile(tf.range(GRID_W), [GRID_H]), (1, GRID_H, GRID_W, 1, 1)))
cell_y = tf.transpose(cell_x, (0,2,1,3,4))

cell_grid = tf.tile(tf.concat([cell_x,cell_y], -1), [BATCH_SIZE, 1, 1, 5, 1])

coord_mask = tf.zeros(mask_shape)
conf_mask  = tf.zeros(mask_shape)
class_mask = tf.zeros(mask_shape)

seen = tf.Variable(0.)

total_AP = tf.Variable(0.)

"""
Adjust prediction
"""
### adjust x and y      
pred_box_xy = tf.sigmoid(y_pred[..., :2]) + cell_grid

### adjust w and h
pred_box_wh = tf.exp(y_pred[..., 2:4]) * np.reshape(ANCHORS, [1,1,1,BOX,2])

### adjust confidence
pred_box_conf = tf.sigmoid(y_pred[..., 4])

### adjust class probabilities
pred_box_class = y_pred[..., 5:]

"""
Adjust ground truth
"""
### adjust x and y
true_box_xy = y_true[..., 0:2] # relative position to the containing cell

### adjust w and h
true_box_wh = y_true[..., 2:4] # number of cells across, horizontally and vertically

### adjust confidence
true_wh_half = true_box_wh / 2.
true_mins    = true_box_xy - true_wh_half
true_maxes   = true_box_xy + true_wh_half

pred_wh_half = pred_box_wh / 2.
pred_mins    = pred_box_xy - pred_wh_half
pred_maxes   = pred_box_xy + pred_wh_half       

intersect_mins  = tf.maximum(pred_mins,  true_mins)
intersect_maxes = tf.minimum(pred_maxes, true_maxes)
intersect_wh    = tf.maximum(intersect_maxes - intersect_mins, 0.)
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

true_areas = true_box_wh[..., 0] * true_box_wh[..., 1]
pred_areas = pred_box_wh[..., 0] * pred_box_wh[..., 1]

union_areas = pred_areas + true_areas - intersect_areas
iou_scores  = tf.truediv(intersect_areas, union_areas)

true_box_conf = iou_scores * y_true[..., 4]

### adjust class probabilities
true_box_class = tf.to_int32(y_true[..., 5])

"""
Determine the masks
"""
### coordinate mask: simply the position of the ground truth boxes (the predictors)
coord_mask = tf.expand_dims(y_true[..., 4], axis=-1) * COORD_SCALE

### confidence mask: penalize predictors + penalize boxes with low IOU
# penalize the confidence of the boxes, which have IOU with some ground truth box < 0.6
true_xy = true_boxes[..., 0:2]
true_wh = true_boxes[..., 2:4]

true_wh_half = true_wh / 2.
true_mins    = true_xy - true_wh_half
true_maxes   = true_xy + true_wh_half

pred_xy = tf.expand_dims(pred_box_xy, 4)
pred_wh = tf.expand_dims(pred_box_wh, 4)

pred_wh_half = pred_wh / 2.
pred_mins    = pred_xy - pred_wh_half
pred_maxes   = pred_xy + pred_wh_half    

intersect_mins  = tf.maximum(pred_mins,  true_mins)
intersect_maxes = tf.minimum(pred_maxes, true_maxes)
intersect_wh    = tf.maximum(intersect_maxes - intersect_mins, 0.)
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

true_areas = true_wh[..., 0] * true_wh[..., 1]
pred_areas = pred_wh[..., 0] * pred_wh[..., 1]

union_areas = pred_areas + true_areas - intersect_areas
iou_scores  = tf.truediv(intersect_areas, union_areas)

best_ious = tf.reduce_max(iou_scores, axis=4)
conf_mask = conf_mask + tf.to_float(best_ious < 0.6) * (1 - y_true[..., 4]) * NO_OBJECT_SCALE

# penalize the confidence of the boxes, which are responsible for corresponding ground truth box
conf_mask = conf_mask + y_true[..., 4] * OBJECT_SCALE

### class mask: simply the position of the ground truth boxes (the predictors)
class_mask = y_true[..., 4] * tf.gather(CLASS_WEIGHTS, true_box_class) * CLASS_SCALE       

"""
Warm-up training
"""
no_boxes_mask = tf.to_float(coord_mask < COORD_SCALE/2.)
seen = tf.assign_add(seen, 1.)

true_box_xy, true_box_wh, coord_mask = tf.cond(tf.less(seen, WARM_UP_BATCHES), 
                      lambda: [true_box_xy + (0.5 + cell_grid) * no_boxes_mask, 
                               true_box_wh + tf.ones_like(true_box_wh) * np.reshape(ANCHORS, [1,1,1,BOX,2]) * no_boxes_mask, 
                               tf.ones_like(coord_mask)],
                      lambda: [true_box_xy, 
                               true_box_wh,
                               coord_mask])

"""
Finalize the loss
"""
nb_coord_box = tf.reduce_sum(tf.to_float(coord_mask > 0.0))
nb_conf_box  = tf.reduce_sum(tf.to_float(conf_mask  > 0.0))
nb_class_box = tf.reduce_sum(tf.to_float(class_mask > 0.0))

loss_xy    = tf.reduce_sum(tf.square(true_box_xy-pred_box_xy)     * coord_mask) / (nb_coord_box + 1e-6) / 2.
loss_wh    = tf.reduce_sum(tf.square(true_box_wh-pred_box_wh)     * coord_mask) / (nb_coord_box + 1e-6) / 2.
loss_conf  = tf.reduce_sum(tf.square(true_box_conf-pred_box_conf) * conf_mask)  / (nb_conf_box  + 1e-6) / 2.
loss_class = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_box_class, logits=pred_box_class)
loss_class = tf.reduce_sum(loss_class * class_mask) / (nb_class_box + 1e-6)

loss = loss_xy + loss_wh + loss_conf + loss_class

nb_true_box = tf.reduce_sum(y_true[..., 4])
nb_pred_box = tf.reduce_sum(tf.to_float(true_box_conf > 0.5) * tf.to_float(pred_box_conf > OBJ_THRESHOLD))

total_AP = tf.assign_add(total_AP, nb_pred_box/nb_true_box) 

loss = tf.Print(loss, [loss_xy, loss_wh, loss_conf, loss_class, loss, total_AP/seen], message='DEBUG', summarize=1000)

return loss

1 个答案:

答案 0 :(得分:1)

Here是一个很好的解释:

  

YOLO将每个图像划分为S x S的网格,并且每个网格预测N个边界框和置信度。置信度反映了边界框的准确性以及边界框是否实际包含一个对象(与类无关)。 YOLO还可以预测训练中每个班级每个盒子的分类得分。您可以组合两个类来计算每个类出现在预测框中的概率。

查看代码:

### adjust x and y      
pred_box_xy = tf.sigmoid(y_pred[..., :2]) + cell_grid

### adjust w and h
pred_box_wh = tf.exp(y_pred[..., 2:4]) * np.reshape(ANCHORS, [1,1,1,BOX,2])

在这里,cell_grid是默认边界框(锚点)的坐标的均匀分布矩阵。 y_pred[..., :2]包含锚点xy坐标的偏移量预测。 y_pred[..., 2:4]预测每个锚点的宽度和高度大小。选择了具有高预测可信度的锚点之后,YOLO将锚点的默认位置与为其预测的偏移量结合在一起-在这里,您便获得了边界框坐标。

请注意,锚点相对较小(左侧图像上的网格单元),因此,为了检测大物体,YOLO会为靠近对象中心某处的锚点预测相当大的偏移量。

enter image description here