GPU内存不足评估:Pytorch

时间:2020-07-08 05:02:10

标签: pytorch

当我仅训练而不验证时,该模型可以很好地训练,但是在评估期间它用尽了内存,但是我不明白为什么这可能是个问题,尤其是因为我使用了torch.no_grad()

def test(epoch,net,testloader,optimizer):
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    idx = 0
    features_all = []
    for batch_idx, (inputs, targets) in enumerate(testloader):
        with torch.no_grad():
            idx = batch_idx
            # inputs, targets = inputs.cpu(), targets.cpu()
            if use_cuda:
                inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = Variable(inputs), Variable(targets)
            save_features, out, ce_loss = net(inputs,targets)
            test_loss += ce_loss.item()
            _, predicted = torch.max(out.data, 1)
            total += targets.size(0)
            correct += predicted.eq(targets.data).cpu().sum().item()
            features_all.append((save_features, predicted, targets.data))
    test_acc = 100.*correct/total
    test_loss = test_loss/(idx+1)
    logging.info('test, test_acc = %.4f,test_loss = %.4f' % (test_acc,test_loss))
    print('test, test_acc = %.4f,test_loss = %.4f' % (test_acc,test_loss))
    return features_all, test_acc

1 个答案:

答案 0 :(得分:1)

features_all.append((save_features, predicted, targets.data))

此行将对张量的引用保存在GPU内存中,因此当循环进行下一次迭代时,CUDA内存将不会释放(这最终会导致GPU内存不足)。保存它们时,将张量移动到CPU(使用.cpu())。