我正在尝试使用压盖数据集实现用于对象检测的功能金字塔网络(基本上将更快的r cnn中的单个要素替换为金字塔要素)。现在我想确保RPN是Faster-RCNN的第一个状态,所以我只训练了RPN。我训练了40001次迭代,初始lr为0.001,主干为ResNet_V2_50。但我遇到了几个问题:
我发现在5000次迭代后,重量基本上没有更新,并且渐变真的接近于零。 wired weight 我试图使用较低的lr,我在tf.train.AdamOptimizer中使用的epsilon是0.0001。重量似乎还没有更新。
至于预测,我觉得边界框预测是相当好的(绿色是基本事实,红色是预测框,如果锚点的重叠率大于0.7,则选择这些预测框。地面真相边界框),predicted bounding box,这意味着至少边界框回归损失可以使预测接近基本事实。但至于rpn_cls_loss,它将太多错误的提案归类为对象wrong proposals。我考虑过类不平衡问题(在腺体细胞数据集中,每个图像的平均锚定数仅为20个,只有85个训练图像),所以我随机选择128个正锚和128个负锚(如果正锚小于128,我random_shuffle索引,所以一个正的锚将出现几次),但它仍然无法正常工作。我不明白的一件事是,当我们计算损失时,我们只考虑一个小批量,但同时,我们将我们生成的所有锚分类为一个对象或非对象,那么这些锚如何呢?没有造成损失?网络能否为这些锚点做出正确的决定?
这是用于提取金字塔特征地图的代码:
P5 = conv_layer_obj(conv5_3,"fpn_c5p5",[1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P4_ = conv_layer_obj(conv4_3, "fpn_c4p4", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P3_ = conv_layer_obj(conv3_3, "fpn_c3p3", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
P2_ = conv_layer_obj(conv2_2, "fpn_c2p2", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
#P4 = tf.add(tf.image.resize_bilinear(P5,[P4_.shape.as_list()[1],P4_.shape.as_list()[1]]),P4_)
P4 = tf.add(utils.nearest_neighbor_upsampling(P5, scale = 2), P4_)
P3 = tf.add(utils.nearest_neighbor_upsampling(P4, scale = 2),P3_)
P2 = tf.add(utils.nearest_neighbor_upsampling(P3, scale = 2),P2_)
P5 = conv_layer_obj(P5,"fpn_p5", [3,256], training_state = training_state, activation_func = None, bias_state = True)
P4 = conv_layer_obj(P4,"fpn_p4", [3,256], training_state = training_state, activation_func = None, bias_state = True)
P3 = conv_layer_obj(P3,"fpn_p3", [3,256], training_state = training_state, activation_func = None, bias_state = True)
P2 = conv_layer_obj(P2,"fpn_p2", [3,256], training_state = training_state, activation_func = None, bias_state = True)
#P6 is used for the 5th anchor scale in RPN, generated by subsampling from P5 by strid 2. The shape should be
#as same as P4
P6 = tf.nn.max_pool(P5,[1,2,2,1],strides = [1,2,2,1], padding = 'VALID', name = "fpn_p6")
这是rpn
的代码anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, config.BACKBONE_SHAPES,
config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)
#Basically, in this step, the anchors that we generated is all the possible anchors in the image. The RPN_ANCHOR_SCALES is
#32,64,128,256,512 which means there are 5 level, and the anchors returned is arranged in this order. The number mean if
#The RPN_ANCHOR_RATIOS is 1, then the bounding box will include 32*32 pixels for the first level of scale. Here it's like
#a selective research, we list all the possible anchors.
layer_outputs = [] #list of lists
index = 0
#rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
for index, single_feature in enumerate(rpn_feature_maps):
if (index == 0):
layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
reuse_state = False, training_state = training_state))
else:
layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
reuse_state = True, training_state = training_state))
#Then concatenate the layer_outputs from [[a1,b1,c1],[a2,b2,c2]] to [[a1,a2],[b1,b2],[c1,c2]]
output_names = ["rpn_class_logits","rpn_class","rpn_bbox"]
outputs = list(zip(*layer_outputs))
output_concate = []
for o, n in zip(outputs, output_names):
output_concate.append(tf.concat(list(o),axis=1,name = n))
rpn_class_logits, rpn_class_prob, rpn_bbox = output_concate
#Then we need to filter out those bounding boxes that are not satisfied the criterion.
proposal_count = tf.cond(training_state,
lambda: config.POST_NMS_ROIS_TRAINING,
lambda: config.POST_NMS_ROIS_INFERENCE)
#This is for proposing the #Number of bounding box that kind of satisfy the criterion, we assume that there are almost 2000
#boxes in each image.
proposallayer = ProposalLayer(proposal_count, config.RPN_NMS_THRESHOLD,anchors = anchors,config = config)
rpn_rois = proposallayer.call([rpn_class_prob, rpn_bbox])
rpn_graph是:
def rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
"""This function is for building up the region proposal graph
Args:
feature_map: The pyramide feature maps. They have different height, width, but same batch size, and Num_of_Channel.
anchor_per_location: The number of anchors for per pixel in the feature map
anchor_stride: Controls the density of the anchor, most of time we set it to be 1. In our case, we will set it to be 1.
anchor_stride is only a int number
training_state: since the training process for rpn and fast rcnn are different, so that the training state is a dynamic variable
Returns:
rpn_class_logits. Shape [Batch_Size, Num_of_Anchors, 2]
rpn_class_prob. Shape [Batch_Size, Num_of_Anchors, 2]
rpn_bbox. Shape [Batch_Size, Num_of_Anchors, 4] (dx,dy,log(dw),log(dh))
"""
#Frist, build the shared feature maps, this is used for the rpn_class_logit, as same as rpn_bbox
#conv_layer_obj(bottom, name, shape, stride, bias_state, relu_state, padding):
#conv_layer_obj(bottom, name, shape, training_state, strides = (1,1), activation_func = tf.nn.relu, padding='same', dilation_rate = (1,1), bias_state = True, reuse_state = False):
shared = conv_layer_obj(feature_map,'rpn_conv_shared',shape = [3,512],strides = anchor_stride, bias_state = True,
activation_func = tf.nn.relu, padding = 'same', reuse_state = reuse_state, training_state = training_state)
#Then, let's do the rpn_class_logit, the output shape for the classifier should be [anchor_per_location*2]
#2 is because either it is a object or it is the background!!!
x = conv_layer_obj(shared,'rpn_class_classifier',shape = [1,anchor_per_location*2], strides = (1,1), bias_state = True,
activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
rpn_class_logit = tf.reshape(x,[x.shape.as_list()[0],-1,2])
rpn_class_prob = tf.nn.softmax(rpn_class_logit)
#Then, let's do the bounding box refinement. The output shape for the classifier should be [anchor_per_location*4]
#4 is because they are [ymin, xmin, ymax, xmax]
x = conv_layer_obj(shared,'rpn_box_classifier',shape = (1,anchor_per_location*4), strides = (1,1), bias_state = True,
activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
rpn_bbox = tf.reshape(x,[x.shape.as_list()[0],-1,4])
return [rpn_class_logit,rpn_class_prob,rpn_bbox]
我真的很感谢你的帮助,提前感谢你的帮助!