训练损失从第一个训练示例开始爆发,然后输出nan

时间:2019-08-17 11:13:00

标签: deep-learning computer-vision conv-neural-network pytorch pose-estimation

我是深度学习的新手,我已经建立了一些基本的CNN,但是这次我试图建立类似于Yolo3的FCN(完全卷积网络)。我的网络包含32个层,其中LeakyRelu作为激活功能和adam优化程序。有680个数据样本,输入图像大小为416x416,与yolo模型相同。我在下面给出了一些代码片段。

我正在使用Pytorch 1.1和cuda 9版本。 我尝试了许多博客中建议的不同学习率,如0.0001、0.000001、0.0000001,还尝试了不同的beta,例如(0.9,0.999),(0.5,0.999)等。我也尝试过训练很长时间,最多200个纪元。

输出

Running loss : 59102027776.0 0%| | 1/630 [00:19<3:24:03, 19.46s/it]Running loss : nan 0%| | 2/630 [00:23<2:34:23, 14.75s/it]Running loss : nan 0%| | 3/630 [00:25<1:53:32, 10.87s/it]Running loss : nan

损失函数公式

请仅在图片中考虑等式:7和8。 点击formula picture

的链接

丢失功能代码

masked_pose_loss = torch.mean(
        torch.sum(mask * torch.sum(torch.mul(pred - true, pred - true), dim=[1, 2]), dim=[1, 2, 3]))

FCN

    self.relu1_1 = nn.LeakyReLU(inplace=True)
    self.conv1_2 = nn.Conv2d(32, 64, 3, stride=1,padding=1)
    self.relu1_2 = nn.LeakyReLU(inplace=True)
    self.pool1 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

    # conv2

    self.conv2_1 = nn.Conv2d(64, 128, 3, stride=1, padding=1)
    self.relu2_1 = nn.LeakyReLU(inplace=True)
    self.conv2_2 = nn.Conv2d(128, 64, 1, stride=1)
    self.relu2_2 = nn.LeakyReLU(inplace=True)
    self.pool2 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

    # conv3
    self.conv3_1 = nn.Conv2d(64, 128, 3, stride=1, padding=1)
    self.relu3_1 = nn.LeakyReLU(inplace=True)
    self.conv3_2 = nn.Conv2d(128, 256, 3, stride=1, padding=1)
    self.relu3_2 = nn.LeakyReLU(inplace=True)
    self.conv3_3 = nn.Conv2d(256, 128, 1, stride=1)
    self.relu3_3 = nn.LeakyReLU(inplace=True)
    self.pool3 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

    # conv4
    self.conv4_1 = nn.Conv2d(128, 256, 3, stride=1, padding=1)
    self.relu4_1 = nn.LeakyReLU(inplace=True)
    self.conv4_2 = nn.Conv2d(256, 512, 3, stride=1, padding=1)
    self.relu4_2 = nn.LeakyReLU(inplace=True)
    self.conv4_3 = nn.Conv2d(512, 256, 1, stride=1)
    self.relu4_3 = nn.LeakyReLU(inplace=True)
    self.pool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

    # conv5
    self.conv5_1 = nn.Conv2d(256, 512, 3, stride=1, padding=1)
    self.relu5_1 = nn.LeakyReLU(inplace=True)
    self.conv5_2 = nn.Conv2d(512, 256, 1, stride=1)
    self.relu5_2 = nn.LeakyReLU(inplace=True)
    self.conv5_3 = nn.Conv2d(256, 512, 3, stride=1, padding=1)
    self.relu5_3 = nn.LeakyReLU(inplace=True)
    self.pool5 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

    # fc6

    self.conv6_1 = nn.Conv2d(512, 1024, 3, stride=1, padding=1)
    self.relu6_1 = nn.LeakyReLU(inplace=True)
    self.conv6_2 = nn.Conv2d(1024, 512, 1, stride=1)
    self.relu6_2 = nn.LeakyReLU(inplace=True)
    self.conv6_3 = nn.Conv2d(512, 1024, 3, stride=1, padding=1)
    self.relu6_3 = nn.LeakyReLU(inplace=True)
    self.conv7_1 = nn.Conv2d(1024, 512, 1, stride=1)
    self.relu7_1 = nn.LeakyReLU(inplace=True)
    self.conv7_2 = nn.Conv2d(512, 1024, 3, stride=1, padding=1)
    self.relu7_2 = nn.LeakyReLU(inplace=True)
    self.conv7_3 = nn.Conv2d(1024, 1024, 3, stride=1, padding=1)
    self.relu7_3 = nn.LeakyReLU(inplace=True)
    self.conv8_1 = nn.Conv2d(1024, 1024, 3, stride=1, padding=1)
    self.relu8_1 = nn.LeakyReLU(inplace=True)
    self.conv8_2 = nn.Conv2d(1024, 1024, 3, stride=1)
    self.conv8_2 = nn.ReLU(inplace=True)
    self.conv_rout16 = nn.Conv2d(512, 64, 1, stride=1)
    self.relu_rout16 = nn.ReLU(inplace=True)enter code here


    # Resuming comments: I was implementing last two layer of network:
    self.convf_1 = nn.Conv2d(1280, 1024, 3, stride=1, padding=1)
    self.reluf_1 = nn.LeakyReLU(inplace=True)

    self.convf_2 = nn.Conv2d(1024, self.target_channel_size, 1, stride=1)

    self.reluf_2 = nn.LeakyReLU(inplace=True)

非常感谢您的回复。

注意:这是我第一次发布。如果我错过了一些必要的信息,请告诉我。

0 个答案:

没有答案