PyTorch rpn_box_reg损失是难的

时间:2019-07-15 13:07:32

标签: python deep-learning pytorch

我正在尝试根据torchvision示例为自定义数据集运行更快的r-cnn模型。

但是,我注意到在训练时,如果xmax小于xmin,则rpn_box_reg的损失将变为nan。 xmax和ymax代表左上角,xmin和ymin代表右下角。这是我在打印边界框时遇到的错误的摘要:

tensor([[ 44., 108.,  49., 224.],
        [ 29.,  73., 210., 230.],
        [ 31.,  58., 139., 228.],
        [ 22.,  43., 339., 222.]], device='cuda:0')
Epoch: [0]  [   0/1173]  eta: 0:09:46  lr: 0.000000  loss: 9.3683 (9.3683)  loss_classifier: 1.7522 (1.7522)  loss_box_reg: 0.0755 (0.0755)  loss_objectness: 6.1522 (6.1522)  loss_rpn_box_reg: 1.3884 (1.3884)  time: 0.4997  data: 0.1162  max mem: 5696
tensor([[  0.,   0., 640., 512.]], device='cuda:0')
tensor([[ 28.,  57., 197., 220.]], device='cuda:0')
tensor([[ 23.,  46., 281., 222.]], device='cuda:0')
tensor([[ 20.,  28., 328., 210.]], device='cuda:0')
tensor([[ 37.,  45.,  47., 161.],
        [ 31.,  39., 111., 154.]], device='cuda:0')
tensor([[  0.,   0., 640., 512.]], device='cuda:0')
tensor([[ 33.,  85., 546., 222.],
        [ 31.,  85., 527., 213.]], device='cuda:0')
tensor([[ 40.,  76.,  29., 211.],
        [ 64.,  51.,  26., 206.],
        [ 40.,  77.,   1., 221.]], device='cuda:0')
Loss is nan, stopping training
{'loss_classifier': tensor(1.78, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(0., device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(16.28, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>)}
An exception has occurred, use %tb to see the full traceback
As you can see, for each box is set as [xmin, ymin, xmax, ymax].

我曾尝试调整学习率,但仍然遇到相同的错误:


optimizer = torch.optim.SGD(params, lr=0.00001,
                            momentum=0.9, weight_decay=0.0005)

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=1,
                                               gamma=0.1)

问题似乎是当x1(xmax)值小于x2(xmin)时,损失rpn_box_reg损失变为NaN。例如,对于下面的图像,绑定框为张量([[53.,89.,7.,226.]]),即[x2,y2,x1,y1]。当x1值小于x2时,损耗变为零,但是,当x1> x2时,损耗会很好。实际上,它训练得很好。如您所见,这些值是正确的,因为骑自行车的人基于上述值具有正确的边界框。我希望这使我所面临的问题更加清楚。

Example Image

0 个答案:

没有答案