根据这个描述,我试图使用keras微调模型:https://keras.io/applications/#inceptionv3
但是,在训练过程中,我发现在使用相同的输入训练后,网络的输出不会保持不变(所有相关的图层都被冻结),这是我不想要的。
我构建了以下玩具示例来调查此事:
import keras.applications.resnet50 as resnet50
from keras.layers import Dense, Flatten, Input
from keras.models import Model
from keras.utils import to_categorical
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
# data
i = np.random.rand(1,224,224,3)
X = np.random.rand(32,224,224,3)
y = to_categorical(np.random.randint(751, size=32), num_classes=751)
# model
base_model = resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224,224,3)))
layer = base_model.output
layer = Flatten(name='myflatten')(layer)
layer = Dense(751, activation='softmax', name='fc751')(layer)
model = Model(inputs=base_model.input, outputs=layer)
# freeze all layers
for layer in model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# features and predictions before training
feat0 = base_model.predict(i)
pred0 = model.predict(i)
weights0 = model.layers[-1].get_weights()
# before training output is consistent
feat00 = base_model.predict(i)
pred00 = model.predict(i)
print(np.allclose(feat0, feat00)) # True
print(np.allclose(pred0, pred00)) # True
# train
model.fit(X, y, batch_size=2, epochs=3, shuffle=False)
# features and predictions after training
feat1 = base_model.predict(i)
pred1 = model.predict(i)
weights1 = model.layers[-1].get_weights()
# these are not the same
print(np.allclose(feat0, feat1)) # False
# Optionally: printing shows they are in fact very different
# print(feat0)
# print(feat1)
# these are not the same
print(np.allclose(pred0, pred1)) # False
# Optionally: printing shows they are in fact very different
# print(pred0)
# print(pred1)
# these are the same and loss does not change during training
# so layers were actually frozen
print(np.allclose(weights0[0], weights1[0])) # True
# Check again if all layers were in fact untrainable
for layer in model.layers:
assert layer.trainable == False # All succeed
# Being overly cautious also checking base_model
for layer in base_model.layers:
assert layer.trainable == False # All succeed
由于我冻结了所有图层,我完全希望预测和两个特征相同,但令人惊讶的是它们不是。
所以我可能犯了一些错误,但我无法理解......任何建议都会非常感激!
答案 0 :(得分:1)
所以问题似乎是模型使用批量标准化层,它们根据训练期间看到的数据更新其内部状态(即它们的权重)。甚至当他们的可训练旗帜设置为False时也会发生这种情况。并且随着它们的权重被更新,输出也会改变。您可以使用问题中的代码并更改以下代码行来检查:
这weights0 = model.layers[-1].get_weights()
到weights0 = model.layers[2].get_weights()
这weights1 = model.layers[-1].get_weights()
到weights1 = model.layers[2].get_weights()
或任何其他批量标准化层的索引。
因为以下断言将不再成立:
print(np.allclose(weights0, weights1)) # Now this is False
据我所知,目前还没有解决方案..
另请参阅Keras'Github页面上的issue。
答案 1 :(得分:0)
训练不稳定的另一个原因可能是因为您使用了非常小的批量,即 batch_size=2
。至少,使用 batch_size=32
。该值对于批量归一化来说太小,无法可靠地计算训练分布统计(均值和方差)的估计。然后使用这些均值和方差值首先对分布进行归一化,然后学习 beta
和 gamma
参数(实际分布)。
查看以下链接了解更多详情:
在介绍和相关工作中,作者批评了BatchNorm并检查了图1:https://arxiv.org/pdf/1803.08494.pdf< /p>
关于“批量规范的诅咒”的好文章:https://towardsdatascience.com/curse-of-batch-normalization-8e6dd20bc304