我正在采用预训练的CNN模型,然后尝试使用并行CNN实现CNN-LSTM,所有CNN都具有来自预训练的相同权重。
# load in CNN
weightsfile = 'final_weights.h5'
modelfile = '2dcnn_model.json'
# load model from json
json_file = open(modelfile, 'r')
loaded_model_json = json_file.read()
json_file.close()
fixed_cnn_model = keras.models.model_from_json(loaded_model_json)
fixed_cnn_model.load_weights(weightsfile)
# remove the last 2 dense FC layers and freeze it
fixed_cnn_model.pop()
fixed_cnn_model.pop()
fixed_cnn_model.trainable = False
print(fixed_cnn_model.summary())
This will produce the summary:
_
________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 4) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 30, 30, 32) 1184
_________________________________________________________________
conv2d_2 (Conv2D) (None, 28, 28, 32) 9248
_________________________________________________________________
conv2d_3 (Conv2D) (None, 26, 26, 32) 9248
_________________________________________________________________
conv2d_4 (Conv2D) (None, 24, 24, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 10, 10, 64) 18496
_________________________________________________________________
conv2d_6 (Conv2D) (None, 8, 8, 64) 36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 2, 2, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 66048
=================================================================
Total params: 224,256
Trainable params: 0
Non-trainable params: 224,256
_________________________________________________________________
现在,我将添加它并编译并显示不可训练的所有人都可以训练。
# create sequential model to get this all before the LSTM
# initialize loss function, SGD optimizer and metrics
loss = 'binary_crossentropy'
optimizer = keras.optimizers.Adam(lr=1e-4,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
decay=0.0)
metrics = ['accuracy']
currmodel = Sequential()
currmodel.add(TimeDistributed(fixed_cnn_model, input_shape=(num_timewins, imsize, imsize, n_colors)))
currmodel.add(LSTM(units=size_mem,
activation='relu',
return_sequences=False))
currmodel.add(Dense(1024, activation='relu')
currmodel.add(Dense(2, activation='softmax')
currmodel = Model(inputs=currmodel.input, outputs = currmodel.output)
config = currmodel.compile(optimizer=optimizer, loss=loss, metrics=metrics)
print(currmodel.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_3_input (In (None, 5, 32, 32, 4) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 512) 224256
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 112600
_________________________________________________________________
dropout_1 (Dropout) (None, 50) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 52224
_________________________________________________________________
dropout_2 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 391,130
Trainable params: 391,130
Non-trainable params: 0
_________________________________________________________________
在这种情况下我应该如何冻结图层?我几乎100%肯定我在早期的keras版本中使用这种格式的代码。看起来这是正确的方向,因为你定义了一个模型并声明某些层是否可训练。
然后添加图层,默认情况下是可训练的。但是,这似乎将所有层转换为可训练的。
答案 0 :(得分:1)
尝试添加
for layer in currmodel.layers[:5]:
layer.trainable = False
答案 1 :(得分:0)
首先打印您网络中的层号
for i,layer in enumerate(currmodel.layers):
print(i,layer.name)
现在检查哪些层是可训练的,哪些不是可训练的
for i,layer in enumerate(model.layers):
print(i,layer.name,layer.trainable)
现在,您可以为所需的图层设置参数“可训练”。假设您只想训练总共6层中的最后2层(编号从0开始),那么您可以编写如下内容
for layer in model.layers[:5]:
layer.trainable=False
for layer in model.layers[5:]:
layer.trainable=True
要进行交叉检查,请尝试再次打印,您将获得所需的设置。