下面的附加代码计算yolo和地面真实情况给定输出的损耗。但是在训练模型的同时,我感到很茫然。调试该功能指出,第一次丢失是对象丢失。 对象损失的计算方式是: x =(IOU(pred_localisation,ground_localisation)-Sigmoid(pred_conf))** 2 y = mask * x .............……………………………………………………………………………………………………………………………………………………………………………………………………。 进一步调试时,当我尝试将x与mask相乘时,第一次出现nan损失。这使我非常困惑,原因是,在检查了两个乘法的输入之后,mask和x在乘法之前都不包含任何nan值。掩码只有1或0。所以当我将掩码(0或1)乘以x(某些实数值)时,怎么会产生NaN值。 这是我为损失计算编写的代码。
def yolo_loss(net_out , ground_truth): #g_t shape = 1,13,13,5,7
gt_conf = ground_truth[... , :1]
gt_loc = ground_truth[... , 1:5]
gt_classes = ground_truth[... , 5:]
pred_conf , pred_loc , pred_classes = get_conf_loc_classes(net_out)
#get the mask
condition = tf.equal(gt_conf , tf.constant(1 , dtype = tf.float32))
mask = tf.where(condition , tf.ones_like(gt_conf) , tf.zeros_like(gt_conf))
#no_ojbect_loss
no_object_loss = tf.reduce_sum((1 - mask) * (tf.square(0 - tf.nn.sigmoid(pred_conf))))
no_object_loss = tf.multiply(no_object_loss , no_object_scale)
#object_loss
object_loss_init = tf.square(IOU(gt_loc , pred_loc) - tf.nn.sigmoid(pred_conf))
is_nan = tf.is_nan(object_loss_init)
is_nan_m = tf.math.is_nan(mask)
is_nan_index = tf.where(is_nan)
is_nan_m_index = tf.where(is_nan_m)
init_print = tf.print("nan is present in object_loss_init :" , is_nan_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/nan.txt")
mask_nan_print = tf.print("nan is present in mask :" , is_nan_m_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/mask_nan.txt")
with tf.control_dependencies([init_print , mask_nan_print]):
object_loss_masked = tf.multiply(mask , object_loss_init)
zero_mask = tf.constant(0 , tf.float32)
where = tf.not_equal(object_loss_masked , zero_mask)
indexes = tf.where(where)
is_nan_masked = tf.math.is_nan(object_loss_masked)
is_nan_masked_index = tf.where(is_nan_masked)
print_op_2 = tf.print("mask multiply_non_zero_indexes : " , indexes , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/log.txt")
print_op_3 = tf.print("mask multiply_nan_index : " , is_nan_masked_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/nan_2.txt")
with tf.control_dependencies([print_op_2, print_op_3]):
object_loss = tf.reduce_sum(object_loss_masked)
print_op_object_loss = tf.print("reduce sum: " , object_loss)
with tf.control_dependencies([print_op_object_loss]):
object_loss_final = tf.multiply(object_loss , object_scale)
#class_loss
class_loss = tf.reduce_sum(mask * tf.square(gt_classes - tf.nn.softmax(pred_classes , axis = -1)))
class_loss = tf.multiply(class_loss , class_scale)
#localization_loss
loc_loss = tf.reduce_sum(mask * tf.square((gt_loc - pred_loc)))
loc_loss = tf.multiply(loc_loss , coord_scale)
print_op = tf.print("losses : " , no_object_loss , object_loss_final , class_loss , loc_loss , output_stream = sys.stdout)
with tf.control_dependencies([print_op]):
total_loss = tf.math.add_n([no_object_loss , object_loss_final , class_loss , loc_loss])
return total_loss
打印语句是我调试NaN值的来源的方式。 get_conf_loc_classes仅返回原始输出中的分隔值。
任何人都可以帮忙