我正在使用自己的数据集在经过预训练的模型上进行迁移学习。 很快,我使用了具有224x224输入形状的预训练的resnet50模型。我正在加载模型,例如:
train_datagen = ImageDataGenerator(validation_split=0.1,rescale=1./255,preprocessing_function=preprocess_input) # set validation split
img_size = 224
batch_size = 32
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_size, img_size),
batch_size=batch_size,
color_mode='rgb',
subset='training') # set as training data
validation_generator = train_datagen.flow_from_directory(
train_data_dir, # same directory as training data
target_size=(img_size, img_size),
batch_size=batch_size,
color_mode='rgb',
subset='validation') # set as validation data
model = ResNet50(include_top=False, weights=None, input_shape=(224,224,3))
model.load_weights("a trained model weights on 224x224")
model.layers.pop()
for layer in model.layers:
layer.trainable = False
x = model.layers[-1].output
x = Flatten(name='flatten')(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(101, activation='softmax', name='pred_age')(x)
top_model = Model(inputs=model.input, outputs=predictions)
top_model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=[accuracy])
EPOCHS = 100
BATCH_SIZE = 32
STEPS_PER_EPOCH = 4424 // BATCH_SIZE
VALIDATION_STEPS = 466 // BATCH_SIZE
callbacks = [LearningRateScheduler(schedule=Schedule(EPOCHS, initial_lr=lr_rate)),
ModelCheckpoint(str(output_dir) + "/weights.{epoch:03d}-{val_loss:.3f}-{val_age_mae:.3f}.hdf5",
monitor="val_age_mae",
verbose=1,
save_best_only=False,
mode="min")
]
hist = top_model.fit_generator(generator=train_set,
epochs=100,
steps_per_epoch = 4424//32,
validation_data=val_set,
validation_steps = 466//32,
verbose=1,
callbacks=callbacks)
总参数:75,020,261 可训练的参数:51,432,549 不可训练的参数:23,587,712
史诗1/100 140/140 [=============================]-1033s 7s / step-损失:> 14.5776-age_mae:12.2994- val_loss:15.6144-val_age_mae:24.8527
Epoch 00001:val_age_mae从inf改进为24.85268,保存了模型> 时代2/100 140/140 [==============================]-969s 7s / step-损失:14.7104-age_mae:11.2545-val_loss :15.6462-val_age_mae:25.1104
TEpoch 00002:val_age_mae从24.85268起没有改善 TEpoch 3/100 T140 / 140 [=============================]-769s 5s /步-损耗:> T14.6159-age_mae: 13.5181-val_loss:15.7551-val_age_mae:29.4640
Epoch 00003:val_age_mae从24.85268没有改善 时代4/100 140/140 [=============================]-815s 6s /步-损失:> 14.6509-age_mae:13.0087- val_loss:15.9366-val_age_mae:18.3581 纪元00004:val_age_mae从24.85268提升至18.35811 时代5/100 140/140 [==============================]-1059s 8s / step-损失:>> 14.3882-age_mae:11.8039 -val_loss:15.6825-val_age_mae:24.6937
Epoch 00005:val_age_mae从18.35811起没有改善 时代6/100 140/140 [==============================]-1052s 8s / step-损失:> 14.4496-age_mae:13.6652- val_loss:15.4278-val_age_mae:24.5045 纪元00006:val_age_mae从18.35811起没有改善
我已经跑了两次,在第4个阶段后,它不再有任何改善。此外,数据集包含约5000张图像。属于训练集的4511张图像。属于验证集的476张图像。
我得到以下损失图
答案 0 :(得分:1)
此问题发生在具有BatchNormalization()的预训练网络中,并且压力很大。相信我!逻辑是,如果未正确训练模型,则BatchNormalization()将破坏所有训练后的权重,因此在反向传播期间会丢失所有内容。 我建议您尝试通过这种方式加载模型:
model = ResNet50(include_top=False, weights=None, input_shape=(224,224,3))
model.trainable = False
inputs = keras.Input(shape=(224,224,3))
x = model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.5)(x) #if your model requires one.
outputs = keras.layers.Dense(num_classes, activation='softmax')(x)
然后可以添加您选择的完全连接的层,然后按如下所示收集整个模型的输出:
model = keras.Model(inputs,outputs)
如果您需要进一步的解释,建议您阅读此link.
然后,根据数据集,您可以通过冻结或解冻要多少层来继续训练模型。希望这会有所帮助。