我正在对MNIST数据进行火炬训练中的NN。该模型可以很好地启动,改进并在训练和测试数据上达到良好的精度,稳定一段时间后,测试和训练精度都会崩溃,如以下base results graph所示。
至于MNIST,我使用60000张训练图像,10000张测试,训练批次大小100和学习率0.01。神经网络由两个完全连接的隐藏层组成,每个隐藏层包含100个节点,这些节点具有ReLU激活功能。 F.cross_entropy用于损失,SGD用于梯度计算。
这不是过度拟合的问题,因为训练和测试的准确性都在下降。我怀疑这与学习率过高有关。在基本情况下,我使用了0.01,但是当我将其降低到0.001时,整个模式会在稍后重复,如following graph所示(请注意x轴比例更改,该模式大约在10倍后发生,这很直观)。使用更低的学习率也获得了类似的结果。
我尝试了单元测试,检查了各个零件并缩小了模型。 Here是当我仅在训练集中使用6个数据点(批次大小2)时得出的结果。毫无疑问,可以完美地拟合训练数据(此处与预期的测试精度明显不同),但是从100%到1/6,所以不比随机选择好。要使网络从完全适合训练的场景中分离出来需要发生什么,有人可以告诉我吗?
这是网络的结构(之前已添加了相关的库),尽管我希望上述症状足以让您识别出没有它的问题所在:
class Network(nn.Module):
def __init__(self):
# call to the super class Module from nn
super(Network, self).__init__()
# fc strand for 'fully connected'
self.fc1 = nn.Linear(in_features=28*28, out_features=100)
self.fc2 = nn.Linear(in_features=100, out_features=100)
self.out = nn.Linear(in_features=100, out_features=10)
def forward(self, t):
# (1) input layer (redundant)
t = t
# (2) hidden linear layer
# As my t consists of 28*28 bit pictures, I need to flatten them:
t = t.reshape(-1, 28*28)
# Now having this reshaped input, add it to the linear layer
t = self.fc1(t)
# Again, apply ReLU as the activation function
t = F.relu(t)
# (3) hidden linear layer
# As above, but reshaping is not needed now
t = self.fc2(t)
t = F.relu(t)
# (4) output layer
t = self.out(t)
t = F.softmax(t, dim=1)
return t
主要执行代码:
for b in range(epochs):
print('***** EPOCH NO. ', b+1)
# getting a batch iterator
batch_iterator = iter(batch_train_loader)
# For loop for a single epoch, based on the length of the training set and the batch size
for a in range(round(train_size/b_size)):
print(a+1)
# get one batch for the iteration
batch = next(batch_iterator)
# decomposing a batch
images, labels = batch[0].to(device), batch[1].to(device)
# to get a prediction, as with individual layers, we need to equate it to the network with the samples as input:
preds = network(images)
# with the predictions, we will use F to get the loss as cross_entropy
loss = F.cross_entropy(preds, labels)
# function for counting the number of correct predictions
get_num_correct(preds, labels))
# calculate the gradients needed for update of weights
loss.backward()
# with the known gradients, we will update the weights according to stochastic gradient descent
optimizer = optim.SGD(network.parameters(), lr=learning_rate)
# with the known weights, step in the direction of correct estimation
optimizer.step()
# check if the whole data check should be performed (for taking full training/test data checks only in evenly spaced intervals on the log scale, pre-calculated later)
if counter in X_log:
# get the result on the whole train data and record them
full_train_preds = network(full_train_images)
full_train_loss = F.cross_entropy(full_train_preds, full_train_labels)
# Record train loss
a_train_loss.append(full_train_loss.item())
# Get a proportion of correct estimates, to make them comparable between train and test data
full_train_num_correct = get_num_correct(full_train_preds, full_train_labels)/train_size
# Record train accuracy
a_train_num_correct.append(full_train_num_correct)
print('Correct predictions of the dataset:', full_train_num_correct)
# Repeat for test predictions
# get the results for the whole test data
full_test_preds = network(full_test_images)
full_test_loss = F.cross_entropy(full_test_preds, full_test_labels)
a_test_loss.append(full_test_loss.item())
full_test_num_correct = get_num_correct(full_test_preds, full_test_labels)/test_size
a_test_num_correct.append(full_test_num_correct)
# update counter
counter = counter + 1
我已经在Google上搜索并检查了这些问题的答案,但是人们要么提出过拟合问题,要么他们的NN根本无法提高训练集的准确性(即他们根本不起作用),而不是寻找一个好的方法。训练适合度,然后完全放松,也取决于训练范围。我希望我没有发布明显的内容,我对NN来说还比较陌生,但是在将其发布到此处之前,我已尽力研究该主题,谢谢您的帮助和理解!
答案 0 :(得分:0)
所以我对此的看法是,您使用了太多的时间并且过度训练了模型(而不是过度拟合)。在不断刷新偏置/权重的某个点之后,它们将不再能够将值与噪声区分开。
我建议您检查一下https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/,看看它是否与您所看到的一致,因为这是我首先想到的。
也可以看看这篇文章。 https://stats.stackexchange.com/questions/198629/difference-between-overtraining-and-overfitting(不是说重复)
此出版物:反向传播中的过度训练 神经网络:CRT颜色 校准例 https://onlinelibrary.wiley.com/doi/pdf/10.1002/col.10027
答案 1 :(得分:0)
原因是代码中的错误。我们需要在训练循环的开始处添加optimizator.zero_grad()
,并在外部训练循环之前创建优化器,即
optimizator = optim.SGD(...)
for b in range(epochs):