pytorch中的ReduceLrOnPlateau调度程序可以使用测试集度量标准来降低学习率吗?

时间:2019-08-06 11:40:16

标签: deep-learning pytorch scheduler

嗨,我目前正在学习在pytroch的深度学习中使用调度程序。我遇到了以下代码:

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

# Set seed
torch.manual_seed(0)

# Where to add a new import
from torch.optim.lr_scheduler import ReduceLROnPlateau

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 6000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity
        self.relu = nn.ReLU()
        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)  

    def forward(self, x):
        # Linear function
        out = self.fc1(x)
        # Non-linearity
        out = self.relu(out)
        # Linear function (readout)
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, nesterov=True)

'''
STEP 7: INSTANTIATE STEP LEARNING SCHEDULER CLASS
'''
# lr = lr * factor 
# mode='max': look for the maximum validation accuracy to track
# patience: number of epochs - 1 where loss plateaus before decreasing LR
        # patience = 0, after 1 bad epoch, reduce LR
# factor = decaying factor
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.1, patience=0, verbose=True)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = images.view(-1, 28*28).requires_grad_()

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = images.view(-1, 28*28)

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                # Without .item(), it is a uint8 tensor which will not work when you pass this number to the scheduler
                correct += (predicted == labels).sum().item()

            accuracy = 100 * correct / total

            # Print Loss
            # print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

    # Decay Learning Rate, pass validation accuracy for tracking at every epoch
    print('Epoch {} completed'.format(epoch))
    print('Loss: {}. Accuracy: {}'.format(loss.item(), accuracy))
    print('-'*20)
    scheduler.step(accuracy)

我正在使用以上策略。我唯一不了解的是,他们如何利用测试数据来通过调度程序提高测试的准确性并降低学习率?这是代码的最后一行。我们可以在训练过程中向调度员展示测试准确性并要求它降低学习率吗?我在github resnet main.py上发现了类似的情况 太。有人可以澄清一下吗?

1 个答案:

答案 0 :(得分:1)

我认为这里的 test 一词可能有些混乱。

测试数据与验证数据之间的差异

该代码实际通过测试引用的是 validation 集合,而不是实际的测试集合。不同之处在于,在训练过程中使用了验证集,以查看模型的推广程度。通常,人们只是切断一部分训练数据并将其用于验证。在我看来,您的代码似乎正在使用相同的数据进行训练和验证,但这只是我的假设,因为我不知道./data的样子。

要以严格的科学方式工作,您的模型在训练过程中应该永远不要看到实际的测试数据,而应该只训练和验证。通过这种方式,我们可以评估模型在训练后对未见数据进行概括的实际能力。

基于验证准确性降低学习率

之所以使用验证数据(在您的情况下称为测试数据)来降低学习率,可能是因为,如果您使用实际的训练数据和训练准确性来进行训练,则该模型更可能过拟合。为什么?当您处于训练准确性的平稳状态时,并不一定意味着它是验证准确性的平稳状态,反之亦然。这意味着您可能会朝着关于验证准确性的一个有希望的方向迈进(并因此朝着一个可以很好地推广的参数的方向),突然间您会降低或提高学习率,因为训练准确性存在(或没有)稳定期。