Question

当我仅训练而不验证时，该模型可以很好地训练，但是在评估期间它用尽了内存，但是我不明白为什么这可能是个问题，尤其是因为我使用了torch.no_grad() ？

def test(epoch,net,testloader,optimizer):
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    idx = 0
    features_all = []
    for batch_idx, (inputs, targets) in enumerate(testloader):
        with torch.no_grad():
            idx = batch_idx
            # inputs, targets = inputs.cpu(), targets.cpu()
            if use_cuda:
                inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = Variable(inputs), Variable(targets)
            save_features, out, ce_loss = net(inputs,targets)
            test_loss += ce_loss.item()
            _, predicted = torch.max(out.data, 1)
            total += targets.size(0)
            correct += predicted.eq(targets.data).cpu().sum().item()
            features_all.append((save_features, predicted, targets.data))
    test_acc = 100.*correct/total
    test_loss = test_loss/(idx+1)
    logging.info('test, test_acc = %.4f,test_loss = %.4f' % (test_acc,test_loss))
    print('test, test_acc = %.4f,test_loss = %.4f' % (test_acc,test_loss))
    return features_all, test_acc

Answer 1

features_all.append((save_features, predicted, targets.data))

此行将对张量的引用保存在GPU内存中，因此当循环进行下一次迭代时，CUDA内存将不会释放（这最终会导致GPU内存不足）。保存它们时，将张量移动到CPU（使用.cpu()）。

GPU内存不足评估：Pytorch

1 个答案: