Question

根据这个描述，我试图使用keras微调模型：https://keras.io/applications/#inceptionv3
但是，在训练过程中，我发现在使用相同的输入训练后，网络的输出不会保持不变（所有相关的图层都被冻结），这是我不想要的。

我构建了以下玩具示例来调查此事：

import keras.applications.resnet50 as resnet50
from keras.layers import Dense, Flatten, Input
from keras.models import Model
from keras.utils import to_categorical
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import numpy as np

# data  
i = np.random.rand(1,224,224,3)
X = np.random.rand(32,224,224,3)
y = to_categorical(np.random.randint(751, size=32), num_classes=751)

# model
base_model = resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224,224,3)))
layer = base_model.output
layer = Flatten(name='myflatten')(layer)
layer = Dense(751, activation='softmax', name='fc751')(layer)
model = Model(inputs=base_model.input, outputs=layer)

# freeze all layers
for layer in model.layers:
    layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# features and predictions before training
feat0 = base_model.predict(i)
pred0 = model.predict(i)
weights0 = model.layers[-1].get_weights()

# before training output is consistent
feat00 = base_model.predict(i)
pred00 = model.predict(i)
print(np.allclose(feat0, feat00)) # True
print(np.allclose(pred0, pred00)) # True

# train
model.fit(X, y, batch_size=2, epochs=3, shuffle=False)

# features and predictions after training
feat1 = base_model.predict(i)
pred1 = model.predict(i)
weights1 = model.layers[-1].get_weights()

# these are not the same
print(np.allclose(feat0, feat1)) # False
# Optionally: printing shows they are in fact very different
# print(feat0)
# print(feat1)

# these are not the same
print(np.allclose(pred0, pred1)) # False
# Optionally: printing shows they are in fact very different
# print(pred0)
# print(pred1)

# these are the same and loss does not change during training
# so layers were actually frozen
print(np.allclose(weights0[0], weights1[0])) # True

# Check again if all layers were in fact untrainable
for layer in model.layers:
     assert layer.trainable == False # All succeed
# Being overly cautious also checking base_model
for layer in base_model.layers:
     assert layer.trainable == False # All succeed

由于我冻结了所有图层，我完全希望预测和两个特征相同，但令人惊讶的是它们不是。

所以我可能犯了一些错误，但我无法理解......任何建议都会非常感激！

Answer 1

所以问题似乎是模型使用批量标准化层，它们根据训练期间看到的数据更新其内部状态（即它们的权重）。甚至当他们的可训练旗帜设置为False时也会发生这种情况。并且随着它们的权重被更新，输出也会改变。您可以使用问题中的代码并更改以下代码行来检查：

这weights0 = model.layers[-1].get_weights()
到weights0 = model.layers[2].get_weights()
这weights1 = model.layers[-1].get_weights()
到weights1 = model.layers[2].get_weights()
或任何其他批量标准化层的索引。

因为以下断言将不再成立：
print(np.allclose(weights0, weights1)) # Now this is False

据我所知，目前还没有解决方案..

另请参阅Keras'Github页面上的issue。

Answer 2

训练不稳定的另一个原因可能是因为您使用了非常小的批量，即 batch_size=2。至少，使用 batch_size=32。该值对于批量归一化来说太小，无法可靠地计算训练分布统计（均值和方差）的估计。然后使用这些均值和方差值首先对分布进行归一化，然后学习 beta 和 gamma 参数（实际分布）。

查看以下链接了解更多详情：

在介绍和相关工作中，作者批评了BatchNorm并检查了图1：https://arxiv.org/pdf/1803.08494.pdf< /p>
关于“批量规范的诅咒”的好文章：https://towardsdatascience.com/curse-of-batch-normalization-8e6dd20bc304

Keras：训练期间冻结层不能提供一致的输出

2 个答案: