我正在尝试保存我实现的 CNN 模型,并用它来执行迁移学习 (TL)。
我想澄清以下四点。
1.(CNN代码)模型保存方式是否正确。
2.(TL代码)模型加载方式是否正确。
3.(TL代码)加载模型的“trainable”属性通常是如何设置的?
4.(TL代码)预训练模型和后续层是否正确组合(大小等)
以下是CNN的模型部分和迁移学习代码。
两者都是预测两个数字的回归模型。输入是图像(+ 监督数据)。
#CNN
input = Input(shape=(100,100,3))
conv_0 = Conv2D(32,kernel_size=3,activation='relu')(input)
pool_0 = MaxPooling2D(pool_size=(2,2))(conv_0)
pool_0 = Dropout(0.25)(pool_0)
conv_1 = Conv2D(64,kernel_size=3,activation='relu')(pool_0)
pool_1 = MaxPooling2D(pool_size=(2,2))(conv_1)
pool_1 = Dropout(0.25)(pool_1)
conv_2 = Conv2D(32,kernel_size=3,activation='relu')(pool_1)
pool_2 = MaxPooling2D(pool_size=(2,2))(conv_2)
pool_2 = Dropout(0.25)(pool_2)
conv_3 = Conv2D(16,kernel_size=3,activation='relu')(pool_2)
pool_3 = MaxPooling2D(pool_size=(2,2))(conv_3)
conv_4 = Conv2D(8,kernel_size=3,activation='relu')(pool_3)
pool_4 = MaxPooling2D(pool_size=(2,2))(conv_4)
flat = Flatten()(pool_4)
denseL = Dense(64,activation='relu')(flat)
denseL = Dropout(0.25)(denseL)
A_output = Dense(1,name="a")(denseL)
B_output = Dense(1,name="b")(denseL)
model = Model(inputs=input, outputs=[A_output,B_output])
model.compile(Adam(learning_rate=0.001),
loss = {'a':'mae','b':'mae'} ,
metrics = {'a':'mae','b':'mae'})
history = model.fit([np.array(Img_train)],[np.array(LabelA_train),np.array(LabelB_train)],
epochs=100, batch_size=16,
validation_data=([np.array(Img_test)],[np.array(LabelA_test),np.array(LabelB_test)]))
model.save('forTransferL.h5')
"""
Outputs for sumally()
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 100, 100, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 98, 98, 32) 896 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 49, 49, 32) 0 conv2d[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 49, 49, 32) 0 max_pooling2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 47, 47, 64) 18496 dropout[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 23, 23, 64) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 23, 23, 64) 0 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 21, 21, 32) 18464 dropout_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 10, 10, 32) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 10, 10, 32) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 16) 4624 dropout_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 4, 4, 16) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 2, 2, 8) 1160 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, 1, 1, 8) 0 conv2d_4[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 8) 0 max_pooling2d_4[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 64) 576 flatten[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 64) 0 dense[0][0]
__________________________________________________________________________________________________
a (Dense) (None, 1) 65 dropout_3[0][0]
__________________________________________________________________________________________________
b (Dense) (None, 1) 65 dropout_3[0][0]
=========================================================================================
"""
#TL
model = load_model('forTransferL.h5')
model.layers[0].trainable = False
x = model.layers[10].output
# The following is the same as part of CNN model.
conv_3 = Conv2D(16,kernel_size=3,activation='relu')(x)
pool_3 = MaxPooling2D(pool_size=(2,2))(conv_3)
conv_4 = Conv2D(8,kernel_size=3,activation='relu')(pool_3)
pool_4 = MaxPooling2D(pool_size=(2,2))(conv_4)
flat = Flatten()(pool_4)
denseL = Dense(64,activation='relu')(flat)
denseL = Dropout(0.25)(denseL)
A_output = Dense(1,name="a")(denseL)
B_output = Dense(1,name="b")(denseL)
model=Model(inputs=model.input,outputs=[A_output,B_output])
为了以防万一,我还包括当前出现在下面的错误文本,
但我认为理解基本实现部分比修复错误更重要。
感谢您的合作。
#error message
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/Users/userABC/OneDrive/Document/StudyAI/transferMymodel.py", line 158, in <module>
pool_m4 = MaxPooling2D(pool_size=(2,2))(conv_m4)
File "C:\Users\userABC\anaconda3\lib\site-packages\keras\engine\base_layer.py", line 1006, in __call__
outputs = call_fn(inputs, *args, **kwargs)
〜Omitted due to the limited number of characters.
ValueError: Negative dimension size caused by subtracting 2 from 1 for '{{node tf.compat.v1.nn.max_pool_1/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](Placeholder)' with input shapes: [?,1,1,8].
答案 0 :(得分:1)
对于第 1 点和第 2 点,您的代码似乎足够正确,尽管在涉及具有自定义层的模型时,我个人更喜欢加载模型权重而不是模型本身。不过,这对您的情况并不重要。
有关第 3 点,请参阅 https://keras.io/guides/transfer_learning/:
<块引用>如果您在模型或任何具有子层的层上设置 trainable = False,则所有子层也将变为不可训练。
这意味着对于您的情况,设置“model.trainable = False”将冻结模型加载部分的所有权重(防止其被更改)。
对于第 4 点,预训练模型和后面的层看起来正确组合,但导致错误的原因是您没有在 Conv2D 层中设置 padding='same'(例如:
conv_3 = Conv2D(16,kernel_size=3,activation='relu', padding='same')(x)
这很重要,因为不指示相同的填充意味着每个 Conv2D 层将图像的高度和宽度缩小 2,并且鉴于您的预训练模型的输出形状是 (8, 8, 16),第一个卷积将产生一个输出(6, 6, 16)的形状,第一个maxpool会产生(3, 3, 16),第二个卷积会产生(1, 1, 8),此时,第二个max pool不能池化特征不再是因为高度和宽度小于 (2, 2)。