Pytorch-无法实现可重复性

时间:2019-12-26 11:09:01

标签: image pytorch random-seed resnet binary-reproducibility

我正在使用Pytorch训练图像分类器模型。训练时,我无法设定种子。我已经利用了所有选择,但仍未获得任何一致的结果。请同样帮我。

我正在使用它,但是我的模型仍然不一致。

    torch.manual_seed(1)
    torch.cuda.manual_seed(1)
    np.random.seed(1)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    model = models.resnet50(pretrained=True)
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 10)

    #Define loss function & optimizer

    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    lrscheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=3, threshold = 0.9)
    model = model.to(device)


    #Train model

    model.train()
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)
            loss = loss_function(outputs, labels)

            loss.backward()
            optimizer.step()

            train_acc = (labels==predicted).sum().item() / images.size(0)

            if (i+1) % 2 == 0:
                print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Acc: %.4f'
                      % (epoch+1, num_epochs, i+1, len(train_dset)//batch_size,
                         loss.item(), train_acc))

            if (i+1) % 5 == 0:
                model.eval()

                with torch.no_grad():

                    num_correct, num_total = 0, 0

                    for (images, labels) in val_loader:
                        images, labels = images.to(device), labels.to(device)

                        outputs = model(images)
                        _, predicted = torch.max(outputs.data, 1)

                        num_correct += (labels==predicted).sum().item()
                        num_total += labels.size(0)

                    val_acc = 1. * num_correct / num_total

                    print('Epoch [%d/%d], Step [%d/%d], Val Acc: %.4f'
                          %(epoch+1, num_epochs, i+1, len(train_dset)//batch_size,
                            val_acc))

                model.train()

1 个答案:

答案 0 :(得分:0)

我使用以下代码使结果可重复,并且似乎可以正常工作:)

torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed) 
# for cuda
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False