Pytorch model.train()和model.eval()行为异常

时间:2019-10-22 10:34:01

标签: pytorch batch-normalization dropout

我的模型是一个基于CNN的模型,具有多个BN层和DO层。 因此,最初,我不小心将model.train()放在了循环之外,如下所示:

model.train()
for e in range(num_epochs):
    # train model
    model.eval()
    # eval model

为便于记录,以上代码训练得很好,并且在验证集上表现出色:

[CV:02][E:001][I:320/320] avg. Loss: 0.460897, avg. Acc: 0.742746, test. acc: 0.708046(max: 0.708046)
[CV:02][E:002][I:320/320] avg. Loss: 0.389883, avg. Acc: 0.798791, test. acc: 0.823563(max: 0.823563)
[CV:02][E:003][I:320/320] avg. Loss: 0.319034, avg. Acc: 0.825559, test. acc: 0.834914(max: 0.834914)
[CV:02][E:004][I:320/320] avg. Loss: 0.301322, avg. Acc: 0.834254, test. acc: 0.834052(max: 0.834914)
[CV:02][E:005][I:320/320] avg. Loss: 0.292184, avg. Acc: 0.839575, test. acc: 0.835201(max: 0.835201)
[CV:02][E:006][I:320/320] avg. Loss: 0.285467, avg. Acc: 0.842266, test. acc: 0.837931(max: 0.837931)
[CV:02][E:007][I:320/320] avg. Loss: 0.279607, avg. Acc: 0.844917, test. acc: 0.829885(max: 0.837931)
[CV:02][E:008][I:320/320] avg. Loss: 0.275252, avg. Acc: 0.846443, test. acc: 0.827874(max: 0.837931)
[CV:02][E:009][I:320/320] avg. Loss: 0.270719, avg. Acc: 0.848150, test. acc: 0.822989(max: 0.837931)

但是,在查看代码时,我意识到我犯了一个错误,因为上面的代码将在第一次迭代后关闭BN层和DO层。

所以,我在循环内移动了一行:model.train():

for e in range(num_epochs):
    model.train()
    #train model
    model.eval()
    #eval model

在这一点上,模型的学习相对较差(就像在下面的输出中看到的模型过度拟合一样)。它具有较高的训练准确性,但对验证集的准确性却明显较低(考虑到BN和DO的通常影响,这时开始变得怪异了:)

[CV:02][E:001][I:320/320] avg. Loss: 0.416946, avg. Acc: 0.750477, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.329121, avg. Acc: 0.798992, test. acc: 0.690948(max: 0.690948)
[CV:02][E:003][I:320/320] avg. Loss: 0.305688, avg. Acc: 0.829053, test. acc: 0.719540(max: 0.719540)
[CV:02][E:004][I:320/320] avg. Loss: 0.290048, avg. Acc: 0.840539, test. acc: 0.741954(max: 0.741954)
[CV:02][E:005][I:320/320] avg. Loss: 0.279873, avg. Acc: 0.848872, test. acc: 0.745833(max: 0.745833)
[CV:02][E:006][I:320/320] avg. Loss: 0.270934, avg. Acc: 0.854274, test. acc: 0.742960(max: 0.745833)
[CV:02][E:007][I:320/320] avg. Loss: 0.263515, avg. Acc: 0.856945, test. acc: 0.741667(max: 0.745833)
[CV:02][E:008][I:320/320] avg. Loss: 0.256854, avg. Acc: 0.858672, test. acc: 0.734483(max: 0.745833)
[CV:02][E:009][I:320/320] avg. Loss: 0.252013, avg. Acc: 0.861363, test. acc: 0.723707(max: 0.745833)
[CV:02][E:010][I:320/320] avg. Loss: 0.245525, avg. Acc: 0.865519, test. acc: 0.711494(max: 0.745833)

所以我心想:“我认为BN层和DO层对我的模型有负面影响”,因此将其删除。但是,在移除BN和DO层的情况下,该模型的效果也不佳(实际上,该模型似乎没有学到任何东西):

[CV:02][E:001][I:320/320] avg. Loss: 0.552687, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503373, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502966, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502870, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502832, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:007][I:320/320] avg. Loss: 0.502800, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:008][I:320/320] avg. Loss: 0.502765, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)

我现在很困惑。我走得更远,进行了另一个实验。我将BN层和DO层放回模型中,并测试了以下内容:

for e in range(num_epochs):
    model.eval()
    # train model
    # eval model

效果不佳:

[CV:02][E:001][I:320/320] avg. Loss: 0.562196, avg. Acc: 0.744774, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506071, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502916, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502859, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502838, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)

我多次进行了上述实验,结果与我上面发布的输出相差不远。 (我正在使用的数据非常简单)。

总而言之,该模型在非常特殊的设置下效果最佳。

  1. 批次归一化和辍学添加到模型中。 (很好)。
  2. 仅在第一个时期使用model.train()训练模型。 (怪异..结合3)
  3. 在其余的迭代中使用model.eval()训练模型。 (也很奇怪)

说实话,我不会像上面那样设置培训程序(我认为没有人会这样做),但是由于某些原因它可以很好地工作。有人经历过类似的经历吗?或者,如果您可以指导我了解模型为何如此运行,将不胜感激!

提前谢谢!

0 个答案:

没有答案