我正在尝试根据此实现https://github.com/allanzelener/YAD2K/自己实施和培训YOLO。我遇到的问题是我的预测张量中的宽度/高度值正在爆炸,并且我从未在预测对象和地面真实对象之间看到高于0的IOU。在第一个时期的前几个小批处理中,这一切都是错误的。损耗和我的大部分预测宽度/高度为nan
。
我的图片大小为416x416
,我使用的是5个锚点,并且有5个类。我将图像划分为13x13
的预测张量的[batch_size, 13, 13, 5, 10]
网格。每一批的基本事实是[batch_size, 13, 13, 5, 5]
,而类概率没有热点。
下面是我的损失函数(基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L152),该函数将图像传递到我的模型中,然后调用predict_transform
,以重塑张量并转换坐标。
def loss_custom(true_box_grid, x):
# training=training is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
y_ = model(x, training=training)
# (batch, rows, cols, anchors, vals)
center_coords, wh_coords, obj_scores, class_probs = DetectNet.predict_transform(y_)
detector_mask = create_mask(true_box_grid)
total_loss = 0
pred_wh_half = wh_coords / 2.
# bottom left corner
pred_mins = center_coords - pred_wh_half
# top right corner
pred_maxes = center_coords + pred_wh_half
true_xy = true_box_grid[..., 0:2]
true_wh = true_box_grid[..., 2:4]
true_wh_half = true_wh / 2.
true_mins = true_xy - true_wh_half
true_maxes = true_xy + true_wh_half
# max bottom left corner
intersect_mins = tf.math.maximum(pred_mins, true_mins)
# min top right corner
intersect_maxes = tf.math.minimum(pred_maxes, true_maxes)
intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.)
# product of difference between x max and x min, y max and y min
intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
pred_areas = wh_coords[..., 0] * wh_coords[..., 1]
true_areas = true_wh[..., 0] * true_wh[..., 1]
union_areas = pred_areas + true_areas - intersect_areas
iou_scores = intersect_areas / union_areas
# Best IOUs for each location.
iou_scores = tf.expand_dims(iou_scores, 4)
best_ious = tf.keras.backend.max(iou_scores, axis=4) # Best IOU scores.
best_ious = tf.expand_dims(best_ious, 4)
# A detector has found an object if IOU > thresh for some true box.
object_detections = tf.keras.backend.cast(best_ious > 0.6, dtype=tf.float32)
no_obj_weights = params.noobj_loss_weight * (1 - object_detections) * (1 - detector_mask[...,:1])
no_obj_loss = no_obj_weights * tf.math.square(obj_scores)
# could use weight here on obj loss
obj_conf_loss = params.obj_loss_weight * detector_mask[...,:1] * tf.math.square(1 - obj_scores)
conf_loss = no_obj_loss + obj_conf_loss
matching_classes = tf.cast(true_box_grid[...,4], tf.int32)
matching_classes = tf.one_hot(matching_classes, params.num_classes)
class_loss = detector_mask[..., :1] * tf.math.square(matching_classes - class_probs)
# keras_yolo does a sigmoid on center_coords here but they should already be between 0 and 1 from predict_transform
pred_boxes = tf.concat([center_coords, wh_coords], axis=-1)
matching_boxes = true_box_grid[..., :4]
coord_loss = params.coord_loss_weight * detector_mask[..., :1] * tf.math.square(matching_boxes - pred_boxes)
confidence_loss_sum = tf.keras.backend.sum(conf_loss)
classification_loss_sum = tf.keras.backend.sum(class_loss)
coordinates_loss_sum = tf.keras.backend.sum(coord_loss)
# not sure why .5 is here, maybe to make sure numbers don't get too large
total_loss = 0.5 * (confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)
return total_loss
下面是predict_transform
(基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L66),其将预测张量重整为网格以与地面真实物体进行比较。对于中心坐标,对象分数和类概率,它采用S型或softmax。
对于宽度,高度坐标,它将对它们执行指数运算(使它们为正),然后将它们乘以锚。这似乎是他们开始爆炸的地方。
def predict_transform(predictions):
predictions = tf.reshape(predictions, [-1, params.grid_height, params.grid_width, params.num_anchors, params.pred_vec_len])
conv_dims = predictions.shape[1:3]
conv_height_index = tf.keras.backend.arange(0, stop=conv_dims[0])
conv_width_index = tf.keras.backend.arange(0, stop=conv_dims[1])
conv_height_index = tf.tile(conv_height_index, [conv_dims[1]]) # (169,) tensor with 0-12 repeating
conv_width_index = tf.tile(tf.expand_dims(conv_width_index, 0), [conv_dims[0], 1]) # (13, 13) tensor with x offset in each row
conv_width_index = tf.keras.backend.flatten(tf.transpose(conv_width_index)) # (169,) tensor with 13 0's followed by 13 1's, etc (y offsets)
conv_index = tf.transpose(tf.stack([conv_height_index, conv_width_index])) # (169, 2)
conv_index = tf.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2]) # y offset, x offset
conv_dims = tf.cast(tf.reshape(conv_dims, [1, 1, 1, 1, 2]), tf.float32) # grid_height x grid_width, max dims of anchors
# makes the center coordinate between 0 and 1, each grid cell is normalized to 1 x 1
center_coords = tf.math.sigmoid(predictions[...,:2])
conv_index = tf.cast(conv_index, tf.float32)
center_coords = (center_coords + conv_index) / conv_dims
# makes the objectness score a probability between 0 and 1
obj_scores = tf.math.sigmoid(predictions[...,4:5])
anchors = DetectNet.get_anchors()
anchors = tf.reshape(anchors, [1, 1, 1, params.num_anchors, 2])
# exp to make width and height positive then multiply by anchor dims to resize box to anchor
# should fit close to anchor, normalizing by conv_dims should make it between 0 and approx 1
wh_coords = (tf.math.exp(predictions[...,2:4])*anchors) / conv_dims
# apply sigmoid to class scores to make them probabilities
class_probs = tf.keras.activations.softmax(predictions[..., 5 : 5 + params.num_classes])
# (batch, rows, cols, anchors, vals)
return center_coords, wh_coords, obj_scores, class_probs
我在基于https://github.com/allanzelener/YAD2K/blob/master/yad2k/models/keras_yolo.py#L352创建基本事实数据时还有另一个疑问。在下面的box[0]
和box[1]
是中心坐标,i
和j
是网格像元坐标(介于0和13之间),而box[2]
和{ {1}}是宽度和高度。
它们均已归一化为位于网格坐标(0到13)之内。它将对象与相应的最佳锚点放置在地面真相网格中。 box[3]
和box[0] - j
确保中心坐标在0到1之间。
但是我不了解box[1] - i
,因为锚点也在网格坐标刻度上,并且商可能小于1,这将在np.log(box[2] / anchors[best_anchor][0])
之后产生负数。在训练过程中,我经常在地面真实数据中看到负宽度和高度,并且不知道该怎么做。
log
这也是我的模型,由于我缺乏计算资源,该模型被淹没了。
if best_iou > 0:
adjusted_box = np.array(
[
box[0] - j, # center should be between 0 and 1, like prediction will be
box[1] - i,
np.log(box[2] / anchors[best_anchor][0]), # quotient might be less than one, not sure why log is used
np.log(box[3] / anchors[best_anchor][1]),
box[4] # class label
],
dtype=np.float32
)
true_box_grid[i, j, best_anchor] = adjusted_box
我想知道如何防止预测的宽度和高度,从而防止损失爆炸。指数可以确保它们为正,这是有道理的。我也可以在它们上做一个S形,但我不想将它们限制在0到1之间。在YOLO论文中,他们提到他们对网络进行了预训练,因此在YOLO训练开始时已经初始化了层权重。这是正确初始化网络的问题吗?