将包含0和1的张量与另一个具有实数值的张量互斥会在损失函数期间给出Nan值

时间:2019-10-07 07:23:02

标签: tensorflow deep-learning computer-vision nan yolo

下面的附加代码计算yolo和地面真实情况给定输出的损耗。但是在训练模型的同时,我感到很茫然。调试该功能指出,第一次丢失是对象丢失。 对象损失的计算方式是:            x =(IOU(pred_localisation,ground_localisation)-Sigmoid(pred_conf))** 2            y = mask * x .............……………………………………………………………………………………………………………………………………………………………………………………………………。 进一步调试时,当我尝试将x与mask相乘时,第一次出现nan损失。这使我非常困惑,原因是,在检查了两个乘法的输入之后,mask和x在乘法之前都不包含任何nan值。掩码只有1或0。所以当我将掩码(0或1)乘以x(某些实数值)时,怎么会产生NaN值。 这是我为损失计算编写的代码。

def yolo_loss(net_out , ground_truth):   #g_t shape = 1,13,13,5,7
    gt_conf = ground_truth[... , :1]
    gt_loc = ground_truth[... , 1:5]
    gt_classes = ground_truth[... , 5:]
    pred_conf , pred_loc , pred_classes = get_conf_loc_classes(net_out)

    #get the mask
    condition = tf.equal(gt_conf , tf.constant(1 , dtype = tf.float32))
    mask = tf.where(condition , tf.ones_like(gt_conf) , tf.zeros_like(gt_conf))

    #no_ojbect_loss
    no_object_loss = tf.reduce_sum((1 - mask) * (tf.square(0 - tf.nn.sigmoid(pred_conf))))
    no_object_loss = tf.multiply(no_object_loss , no_object_scale)

    #object_loss
    object_loss_init = tf.square(IOU(gt_loc , pred_loc) - tf.nn.sigmoid(pred_conf))
    is_nan = tf.is_nan(object_loss_init)
    is_nan_m = tf.math.is_nan(mask)
    is_nan_index = tf.where(is_nan)
    is_nan_m_index = tf.where(is_nan_m)
    init_print = tf.print("nan is present in object_loss_init :" , is_nan_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/nan.txt")
    mask_nan_print = tf.print("nan is present in mask :" , is_nan_m_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/mask_nan.txt")
    with tf.control_dependencies([init_print , mask_nan_print]):
        object_loss_masked = tf.multiply(mask , object_loss_init)


    zero_mask = tf.constant(0 , tf.float32)
    where = tf.not_equal(object_loss_masked , zero_mask)
    indexes = tf.where(where)
    is_nan_masked = tf.math.is_nan(object_loss_masked)
    is_nan_masked_index = tf.where(is_nan_masked)
    print_op_2 = tf.print("mask multiply_non_zero_indexes : " , indexes , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/log.txt")
    print_op_3 = tf.print("mask multiply_nan_index : " , is_nan_masked_index , output_stream = "file:///home/yogeesh/yogeesh/yolo_v2/nan_2.txt")
    with tf.control_dependencies([print_op_2,  print_op_3]):
        object_loss = tf.reduce_sum(object_loss_masked)

    print_op_object_loss = tf.print("reduce sum: " , object_loss)
    with tf.control_dependencies([print_op_object_loss]):
        object_loss_final = tf.multiply(object_loss , object_scale)

    #class_loss
    class_loss = tf.reduce_sum(mask * tf.square(gt_classes - tf.nn.softmax(pred_classes , axis = -1)))
    class_loss = tf.multiply(class_loss , class_scale)

    #localization_loss
    loc_loss = tf.reduce_sum(mask * tf.square((gt_loc - pred_loc)))
    loc_loss = tf.multiply(loc_loss , coord_scale)

    print_op = tf.print("losses : " , no_object_loss , object_loss_final , class_loss , loc_loss , output_stream = sys.stdout)
    with tf.control_dependencies([print_op]):
        total_loss = tf.math.add_n([no_object_loss , object_loss_final , class_loss , loc_loss])

    return total_loss

打印语句是我调试NaN值的来源的方式。 get_conf_loc_classes仅返回原始输出中的分隔值。

任何人都可以帮忙

0 个答案:

没有答案