为什么我的CNN在使用MSE损失功能时会以nan作为损失结果

时间:2019-04-14 16:57:10

标签: python conv-neural-network linear-regression pytorch mse

我有一个用于面部关键点检测的CNN。我可以绘制地面真相点,并可以预测要显示的关键点。但是,它们聚集在图像中心周围,并且在训练过程中或训练后都不会移动。

我在SGD优化器中使用nll_loss。有了这个,我得到了预期的损失值,例如0.00364755。我读到,MSE是解决回归问题的不错选择,我将损失函数更改为MSE,现在,在训练中,我得到的全部都是损失。我有条不紊地更改了参数示例,例如lr从0.1-0.0000001,动量0.1-0.9,批量大小1、16、32、64、128

这是一些代码。一如既往地感谢您的帮助。

    criterion = nn.MSELoss()

def train(args, model, device, train_loader, criterion, optimizer, 
epoch):
    running_loss = 0.0
    total_train_loss = 0.0
    model.train()
    for batch_idx, batch in enumerate(train_loader):
        data = batch['image']
        target = batch['key_points']
        target = target.view(target.size(0), -1)
        data, target = data.to(device), target.to(device)
        target = target.type(torch.cuda.FloatTensor)
        data = data.type(torch.cuda.FloatTensor)
        data = data.unsqueeze(1).float()
        output = model(data)  # forward pass
        target = target.view(target.shape[0], -1)
        # print(target.dtype, ' Target Data Type')
        # print(data.dtype, ' Data Data Type')
        train_loss = criterion(output, target)
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        # print loss 
        running_loss += train_loss.item()
        if batch_idx % 10 == 9:  # print every 10 batches
            print('Epoch: {}, Batch: {}, Avg. Loss: 
 {}'.format(epoch + 1, batch_idx + 1, running_loss / 1000))
        # running_loss = 0.0

  print('Finished Training')


def test(args, model, device, criterion, val_loader):
    model.eval()
    total_loss = 0
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for batch_idx, batch in enumerate(val_loader):
            data = batch['image']
            target = batch['key_points']
            data, target = data.to(device), target.to(device)
            target = target.type(torch.cuda.FloatTensor)
            data = data.type(torch.cuda.FloatTensor)
            output = model(data)

            test_loss = criterion(output, target)
            # print('TESTING 4: Data and target shape: ', 
data.shape, ' ', target.shape)

            total_loss += test_loss

                test_loss /= len(val_loader.dataset)

                print('\nTest set: Average loss: {:.4f}, Accuracy: 
    {}/{} 
    ({:.0f}%)\n'.format(
                    test_loss, correct, len(val_loader.dataset),
                    100. * correct / len(val_loader.dataset)))
        return total_loss / batch_idx
    enter code here

0 个答案:

没有答案