Keras - 冻结模型然后添加可训练的图层

时间:2018-03-05 14:41:36

标签: tensorflow keras

我正在采用预训练的CNN模型,然后尝试使用并行CNN实现CNN-LSTM,所有CNN都具有来自预训练的相同权重。

# load in CNN
weightsfile = 'final_weights.h5'
modelfile = '2dcnn_model.json'

# load model from json
json_file = open(modelfile, 'r')
loaded_model_json = json_file.read()
json_file.close()
fixed_cnn_model = keras.models.model_from_json(loaded_model_json)
fixed_cnn_model.load_weights(weightsfile)

# remove the last 2 dense FC layers and freeze it
fixed_cnn_model.pop()
fixed_cnn_model.pop()
fixed_cnn_model.trainable = False

print(fixed_cnn_model.summary())
This will produce the summary:

_

________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 4)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 30, 30, 32)        1184      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 32)        9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 26, 26, 32)        9248      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 10, 10, 64)        18496     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 2, 2, 128)         73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 128)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               66048     
=================================================================
Total params: 224,256
Trainable params: 0
Non-trainable params: 224,256
_________________________________________________________________

现在,我将添加它并编译并显示不可训练的所有人都可以训练。

# create sequential model to get this all before the LSTM

# initialize loss function, SGD optimizer and metrics
loss = 'binary_crossentropy'
optimizer = keras.optimizers.Adam(lr=1e-4, 
                                beta_1=0.9, 
                                beta_2=0.999,
                                epsilon=1e-08,
                                decay=0.0)
metrics = ['accuracy']

currmodel = Sequential()
currmodel.add(TimeDistributed(fixed_cnn_model, input_shape=(num_timewins, imsize, imsize, n_colors)))
currmodel.add(LSTM(units=size_mem, 
            activation='relu', 
            return_sequences=False))
currmodel.add(Dense(1024, activation='relu')
currmodel.add(Dense(2, activation='softmax')

currmodel = Model(inputs=currmodel.input, outputs = currmodel.output)
config = currmodel.compile(optimizer=optimizer, loss=loss, metrics=metrics) 
print(currmodel.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_3_input (In (None, 5, 32, 32, 4)      0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 512)            224256    
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                112600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              52224     
_________________________________________________________________
dropout_2 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 2050      
=================================================================
Total params: 391,130
Trainable params: 391,130
Non-trainable params: 0
_________________________________________________________________

在这种情况下我应该如何冻结图层?我几乎100%肯定我在早期的keras版本中使用这种格式的代码。看起来这是正确的方向,因为你定义了一个模型并声明某些层是否可训练。

然后添加图层,默认情况下是可训练的。但是,这似乎将所有层转换为可训练的。

2 个答案:

答案 0 :(得分:1)

尝试添加

    for layer in currmodel.layers[:5]:
        layer.trainable = False

答案 1 :(得分:0)

首先打印您网络中的层号

for i,layer in enumerate(currmodel.layers):
    print(i,layer.name)

现在检查哪些层是可训练的,哪些不是可训练的

for i,layer in enumerate(model.layers):
    print(i,layer.name,layer.trainable)

现在,您可以为所需的图层设置参数“可训练”。假设您只想训练总共6层中的最后2层(编号从0开始),那么您可以编写如下内容

for layer in model.layers[:5]:
    layer.trainable=False
for layer in model.layers[5:]:
    layer.trainable=True

要进行交叉检查,请尝试再次打印,您将获得所需的设置。