Pytorch教程损失没有按预期减少

时间:2019-11-19 18:33:36

标签: python pytorch

我正在重新实现Pytorch cifar10 tutorial的pytorch教程

但是我想使用其他模型。 我不想使用完全连接的层(在pytorch线性中),我想添加批处理规范化。

我的模型如下:

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
  def __init__(self):
      super(Net, self).__init__()
      self.pool = nn.MaxPool2d(2,2)
      self.conv1 = nn.Conv2d(in_channels=3,out_channels=16,kernel_size=3, padding=1, padding_mode='zeros')
      self.conv1_bn = nn.BatchNorm2d(16)
      self.conv2 = nn.Conv2d(in_channels=16,out_channels=32,kernel_size=3, padding=1, padding_mode='zeros')
      self.conv2_bn = nn.BatchNorm2d(32)
      self.conv3 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size=3, padding=1, padding_mode='zeros')
      self.conv3_bn = nn.BatchNorm2d(64)
      self.conv4 = nn.Conv2d(64,64,3, padding=1, padding_mode='zeros')
      self.conv4_bn = nn.BatchNorm2d(64)
      self.conv5 = nn.Conv2d(64,10,2,padding=0)
  def forward(self, x): # x has shape (4,32,32,3)
      x = self.pool(F.relu(self.conv1_bn(self.conv1(x)))) # feature map resolution is now 16*16
      x = self.pool(F.relu(self.conv2_bn(self.conv2(x)))) # resolution now 8*8
      x = self.pool(F.relu(self.conv3_bn(self.conv3(x)))) #resolution now 4*4
      x = self.pool(F.relu(self.conv4_bn(self.conv4(x)))) # now 2*2
      x = F.relu(self.conv5(x)) # The output shape is (batchsize, 1,1,10)

      return x

Batchsize为4,图像分辨率为32 * 32,因此inputsize为4,32,32,3 卷积层不会由于填充而减小要素图的分辨率大小。分辨率与maxpool层减半。 Conv5获得形状为4,2,2,64的输入。 现在,我使用filtersize 2并且不使用填充来获得1 * 1的分辨率。 我有10个课程,所以我使用10个过滤器。最后的每个过滤器都应预测其对应的类。 现在输出的形状为(4,1,1,10)。 但是,当我尝试训练该模型时,损失不会减少。教程模型和我的网络的参数数量大约相同,约为62k。

这是其余的代码。这与教程中的代码相同,但是我必须重塑输出以使其适合。 (本教程中的输出为(4,10),我的为4,1,1,10)

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
for epoch in range(2):  # loop over the dataset multiple times

  running_loss = 0.0
  for i, data in enumerate(trainloader, 0):
      # get the inputs; data is a list of [inputs, labels]
      inputs, labels = data[0].to(device), data[1].to(device)

      # zero the parameter gradients
      optimizer.zero_grad()

      # forward + backward + optimize
      outputs = net(inputs) # I get the values as 4,1,1,10
      outputs_reshaped = outputs.reshape(4,10)
      loss = criterion(outputs_reshaped, labels)
      loss.backward()
      optimizer.step()
      running_loss +=loss.item()
      if i % 2000 == 1999:    # print every 2000 mini-batches
          print('[%d, %5d] loss: %.3f' %
                (epoch + 1, i + 1, running_loss / 2000))
          running_loss = 0.0 

我的损失看起来像这样。

[1,  2000] loss: 2.348
[1,  2000] loss: 2.477
[1,  4000] loss: 2.482
[1,  6000] loss: 2.468
[1,  8000] loss: 2.471
[1, 10000] loss: 2.482
[1, 12000] loss: 2.485
[2,  2000] loss: 2.486
[2,  4000] loss: 2.470
[2,  6000] loss: 2.479
[2,  8000] loss: 2.481
[2, 10000] loss: 2.474
[2, 12000] loss: 2.470

我的模型似乎什么也没学。有人知道为什么会发生这种情况吗?

2 个答案:

答案 0 :(得分:2)

对于这么小的批处理量,您的学习率和动量组合太大,请尝试以下方法:

optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.0)
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

更新:我刚刚意识到另一个问题是您正在网络末端使用relu激活。如果您查看CrossEntropyLoss的文档,则有一条建议:

  

该输入应包含每个原始的未归类分数   课。

尝试通过从conv5中删除最后一个relu并保持lr = 0.01和动量= 0.9来训练您的网络。交叉熵损失之前的Relu会丢弃有关班级成绩的信息。

答案 1 :(得分:1)

因此,如果您遇到类似的问题 我将优化器更改为

function C = xcorrAB(A, B, maxlag)
% like xcorr but computed between all the columns of A and the columns of B.
% A and B must have the same number of rows
%
% C is 2*maxlag-1 x size(A, 2)*size(B, 2). 
% You may want to call reshape(C, [], size(A, 2), size(B, 2)) to make the 
% output more straightforward to use

    [m,n] = size(A);
    assert(size(B, 1) == m, 'A and B must have same number of rows');

    if nargin < 3
        maxlag = m - 1;
    end
    mxl = min(maxlag,m - 1);

    m2 = findTransformLength(m);

    XA = fft(A,m2,1);
    XB = fft(B,m2,1);
    C = reshape(XA,m2,1,n).*conj(XB(:,:));

    % Call IFFT and force real output if x is real.
    c1 = ifft(C,[],1,'symmetric');
    % c1 is M2-by-N-by-N.
    % Keep only the lags we want, and move the negative lags before the
    % positive lags. Also flatten the result to 2-D.
    C = [c1(m2 - mxl + (1:mxl),:); c1(1:mxl+1,:)];

end

function m = findTransformLength(m)
    m = 2*m;
    while true
        r = m;
        for p = [2 3 5 7]
            while (r > 1) && (mod(r, p) == 0)
                r = r / p;
            end
        end
        if r == 1
            break;
        end
        m = m + 1;
    end
end

我在forward()中的最后一行 是

 optimizer = optim.Adam(net.parameters(),0.001)

我删除了relu,现在是

      x = F.relu(self.conv5(x))

现在损失正在按预期减少(比具有相同参数数量的教程要快得多)