微调resnet50时如何冻结一些图层

时间:2017-10-06 17:09:54

标签: neural-network keras resnet

我正在尝试用keras微调resnet 50。当我冻结resnet50中的所有图层时,一切正常。但是,我想冻结一些resnet50层,而不是所有层。但是当我这样做时,我会遇到一些错误。这是我的代码:

base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))

#this is where the error happens. The commented code works fine
"""
for layer in base_model.layers:
    layer.trainable = False
"""
for layer in base_model.layers[:-26]:
    layer.trainable = False
model.summary()
optimizer = Adam(lr=1e-4)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

callbacks = [
    EarlyStopping(monitor='val_loss', patience=4, verbose=1, min_delta=1e-4),
    ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, cooldown=2, verbose=1),
    ModelCheckpoint(filepath='weights/renet50_best_weight.fold_' + str(fold_count) + '.hdf5', save_best_only=True,
                    save_weights_only=True)
    ]

model.load_weights(filepath="weights/renet50_best_weight.fold_1.hdf5")
model.fit_generator(generator=train_generator(), steps_per_epoch=len(df_train) // batch_size,  epochs=epochs, verbose=1,
                  callbacks=callbacks, validation_data=valid_generator(), validation_steps = len(df_valid) // batch_size) 

错误如下:

Traceback (most recent call last):
File "/home/jamesben/ai_challenger/src/train.py", line 184, in <module> model.load_weights(filepath="weights/renet50_best_weight.fold_" + str(fold_count) + '.hdf5')
File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 719, in load_weights topology.load_weights_from_hdf5_group(f, layers)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 3095, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2193, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 944, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_72:0', which has shape '(3, 3, 128, 128)'

任何人都可以帮我解决一下我应该用resnet50冻结多少层?

1 个答案:

答案 0 :(得分:6)

load_weights()save_weights()与嵌套模型一起使用时,如果trainable设置不相同,则很容易出错。

要解决此错误,请确保在调用model.load_weights()之前冻结相同的图层。也就是说,如果保存权重文件并冻结所有图层,则过程将为:

  1. 重新创建模型
  2. 冻结base_model
  3. 中的所有图层
  4. 加载重量
  5. 取消冻结您要训练的图层(在本例中为base_model.layers[-26:]
  6. 例如,

    base_model = ResNet50(include_top=False, input_shape=(224, 224, 3))
    model = Sequential()
    model.add(base_model)
    model.add(Flatten())
    model.add(Dense(80, activation="softmax"))
    
    for layer in base_model.layers:
        layer.trainable = False
    model.load_weights('all_layers_freezed.h5')
    
    for layer in base_model.layers[-26:]:
        layer.trainable = True
    

    根本原因:

    当您致电model.load_weights()时,(大致)通过以下步骤加载每个图层的权重(在topology.py中的load_weights_from_hdf5_group()函数中):

    1. 致电layer.weights以获取权重张量
    2. 将每个重量张量与hdf5文件中相应的重量值相匹配
    3. 致电K.batch_set_value()将重量值分配给重量张量
    4. 如果您的模型是嵌套模型,则由于步骤1,您必须小心trainable

      我将用一个例子来解释它。对于与上述相同的模型,model.summary()给出:

      _________________________________________________________________
      Layer (type)                 Output Shape              Param #
      =================================================================
      resnet50 (Model)             (None, 1, 1, 2048)        23587712
      _________________________________________________________________
      flatten_10 (Flatten)         (None, 2048)              0
      _________________________________________________________________
      dense_5 (Dense)              (None, 80)                163920
      =================================================================
      Total params: 23,751,632
      Trainable params: 11,202,640
      Non-trainable params: 12,548,992
      _________________________________________________________________
      

      在加载重量期间,内部ResNet50模型被视为model的图层。加载图层resnet50时,在步骤1中,调用layer.weights等同于调用base_model.weights。将收集并返回ResNet50模型中所有图层的权重张量列表。

      现在的问题是,在构建权重张量列表时,可训练的权重将来自不可训练的权重。在Layer类的定义中:

      @property
      def weights(self):
          return self.trainable_weights + self.non_trainable_weights
      

      如果base_model中的所有图层都已冻结,则权重张量将按以下顺序排列:

      for layer in base_model.layers:
          layer.trainable = False
      print(base_model.weights)
      
      [<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
       <tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
       <tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
       ...
       <tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
       <tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
      

      但是,如果某些层是可训练的,可训练层的重量张量将先于冻结层的重量张量:

      for layer in base_model.layers[-5:]:
          layer.trainable = True
      print(base_model.weights)
      
      [<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
       <tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
       <tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
       <tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
       <tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
       ...
       <tf.Variable 'bn5c_branch2b/moving_mean:0' shape=(512,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2b/moving_variance:0' shape=(512,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
       <tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
      

      顺序的变化是你得到张量形状错误的原因。保存在hdf5文件中的权重值与上述步骤2中的错误权重张量匹配。冻结所有图层时一切正常的原因是因为模型检查点也被保存,所有图层都被冻结,因此顺序正确。

      可能更好的解决方案:

      您可以使用功能API来避免嵌套模型。例如,以下代码应该可以正常工作:

      base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
      x = Flatten()(base_model.output)
      x = Dense(80, activation="softmax")(x)
      model = Model(base_model.input, x)
      
      for layer in base_model.layers:
          layer.trainable = False
      model.save_weights("all_nontrainable.h5")
      
      base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
      x = Flatten()(base_model.output)
      x = Dense(80, activation="softmax")(x)
      model = Model(base_model.input, x)
      
      for layer in base_model.layers[:-26]:
          layer.trainable = False
      model.load_weights("all_nontrainable.h5")