Question

我正在尝试使用压盖数据集实现用于对象检测的功能金字塔网络（基本上将更快的r cnn中的单个要素替换为金字塔要素）。现在我想确保RPN是Faster-RCNN的第一个状态，所以我只训练了RPN。我训练了40001次迭代，初始lr为0.001，主干为ResNet_V2_50。但我遇到了几个问题：

我发现在5000次迭代后，重量基本上没有更新，并且渐变真的接近于零。 wired weight 我试图使用较低的lr，我在tf.train.AdamOptimizer中使用的epsilon是0.0001。重量似乎还没有更新。
至于预测，我觉得边界框预测是相当好的（绿色是基本事实，红色是预测框，如果锚点的重叠率大于0.7，则选择这些预测框。地面真相边界框），predicted bounding box，这意味着至少边界框回归损失可以使预测接近基本事实。但至于rpn_cls_loss，它将太多错误的提案归类为对象wrong proposals。我考虑过类不平衡问题（在腺体细胞数据集中，每个图像的平均锚定数仅为20个，只有85个训练图像），所以我随机选择128个正锚和128个负锚（如果正锚小于128，我random_shuffle索引，所以一个正的锚将出现几次），但它仍然无法正常工作。我不明白的一件事是，当我们计算损失时，我们只考虑一个小批量，但同时，我们将我们生成的所有锚分类为一个对象或非对象，那么这些锚如何呢？没有造成损失？网络能否为这些锚点做出正确的决定？

这是用于提取金字塔特征地图的代码：

P5 = conv_layer_obj(conv5_3,"fpn_c5p5",[1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P4_ = conv_layer_obj(conv4_3, "fpn_c4p4", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P3_ = conv_layer_obj(conv3_3, "fpn_c3p3", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P2_ = conv_layer_obj(conv2_2, "fpn_c2p2", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
#P4 = tf.add(tf.image.resize_bilinear(P5,[P4_.shape.as_list()[1],P4_.shape.as_list()[1]]),P4_)
P4 = tf.add(utils.nearest_neighbor_upsampling(P5, scale = 2), P4_)  
P3 = tf.add(utils.nearest_neighbor_upsampling(P4, scale = 2),P3_)
P2 = tf.add(utils.nearest_neighbor_upsampling(P3, scale = 2),P2_)

P5 = conv_layer_obj(P5,"fpn_p5", [3,256], training_state = training_state, activation_func = None, bias_state = True)

P4 = conv_layer_obj(P4,"fpn_p4", [3,256], training_state = training_state, activation_func = None, bias_state = True)    
P3 = conv_layer_obj(P3,"fpn_p3", [3,256], training_state = training_state, activation_func = None, bias_state = True)

P2 = conv_layer_obj(P2,"fpn_p2", [3,256], training_state = training_state, activation_func = None, bias_state = True)
#P6 is used for the 5th anchor scale in RPN, generated by subsampling from P5 by strid 2. The shape should be
#as same as P4
P6 = tf.nn.max_pool(P5,[1,2,2,1],strides = [1,2,2,1], padding = 'VALID', name = "fpn_p6")

这是rpn

anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, config.BACKBONE_SHAPES,
                                         config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)
#Basically, in this step, the anchors that we generated is all the possible anchors in the image. The RPN_ANCHOR_SCALES is
#32,64,128,256,512 which means there are 5 level, and the anchors returned is arranged in this order. The number mean if
#The RPN_ANCHOR_RATIOS is 1, then the bounding box will include 32*32 pixels for the first level of scale. Here it's like
#a selective research, we list all the possible anchors.

layer_outputs = [] #list of lists
index = 0
#rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
for index, single_feature in enumerate(rpn_feature_maps): 
    if (index == 0):
        layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
                                       reuse_state = False, training_state = training_state))
    else:
        layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
                                       reuse_state = True, training_state = training_state))
#Then concatenate the layer_outputs from [[a1,b1,c1],[a2,b2,c2]] to [[a1,a2],[b1,b2],[c1,c2]]
output_names = ["rpn_class_logits","rpn_class","rpn_bbox"]
outputs = list(zip(*layer_outputs))
output_concate = []
for o, n in zip(outputs, output_names):
    output_concate.append(tf.concat(list(o),axis=1,name = n))

rpn_class_logits, rpn_class_prob, rpn_bbox = output_concate

#Then we need to filter out those bounding boxes that are not satisfied the criterion. 
proposal_count =  tf.cond(training_state,
                          lambda: config.POST_NMS_ROIS_TRAINING,
                          lambda: config.POST_NMS_ROIS_INFERENCE)

#This is for proposing the #Number of bounding box that kind of satisfy the criterion, we assume that there are almost 2000
#boxes in each image.
proposallayer = ProposalLayer(proposal_count, config.RPN_NMS_THRESHOLD,anchors = anchors,config = config)
rpn_rois = proposallayer.call([rpn_class_prob, rpn_bbox])

rpn_graph是：

def rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
"""This function is for building up the region proposal graph

Args:
feature_map: The pyramide feature maps. They have different height, width, but same batch size, and Num_of_Channel. 
anchor_per_location: The number of anchors for per pixel in the feature map
anchor_stride: Controls the density of the anchor, most of time we set it to be 1. In our case, we will set it to be 1.
                anchor_stride is only a int number
training_state: since the training process for rpn and fast rcnn are different, so that the training state is a dynamic variable

Returns:
rpn_class_logits. Shape [Batch_Size, Num_of_Anchors, 2]
rpn_class_prob. Shape [Batch_Size, Num_of_Anchors, 2]
rpn_bbox. Shape [Batch_Size, Num_of_Anchors, 4] (dx,dy,log(dw),log(dh))
"""

#Frist, build the shared feature maps, this is used for the rpn_class_logit, as same as rpn_bbox 
#conv_layer_obj(bottom, name, shape, stride, bias_state, relu_state, padding):
#conv_layer_obj(bottom, name, shape, training_state, strides = (1,1), activation_func = tf.nn.relu, padding='same', dilation_rate = (1,1), bias_state = True, reuse_state = False):
shared = conv_layer_obj(feature_map,'rpn_conv_shared',shape = [3,512],strides = anchor_stride, bias_state = True, 
                        activation_func = tf.nn.relu, padding = 'same', reuse_state = reuse_state, training_state = training_state)

#Then, let's do the rpn_class_logit, the output shape for the classifier should be [anchor_per_location*2]
#2 is because either it is a object or it is the background!!!
x = conv_layer_obj(shared,'rpn_class_classifier',shape = [1,anchor_per_location*2], strides = (1,1), bias_state = True,
                   activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
rpn_class_logit = tf.reshape(x,[x.shape.as_list()[0],-1,2])
rpn_class_prob = tf.nn.softmax(rpn_class_logit)

#Then, let's do the bounding box refinement. The output shape for the classifier should be [anchor_per_location*4]
#4 is because they are [ymin, xmin, ymax, xmax]
x = conv_layer_obj(shared,'rpn_box_classifier',shape = (1,anchor_per_location*4), strides = (1,1), bias_state = True,
                   activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
rpn_bbox = tf.reshape(x,[x.shape.as_list()[0],-1,4])

return [rpn_class_logit,rpn_class_prob,rpn_bbox]

我真的很感谢你的帮助，提前感谢你的帮助！

区域提案网络提出的错误地区提案太多

0 个答案: