验证和训练曲线保持平坦

时间:2021-03-03 16:51:35

标签: python tensorflow machine-learning deep-learning

我正在构建一个 Tensorflow/Keras 多分类模型来识别 33 种不同的鸟类。我有大约 33,000 张图像,我已经清理并删除了损坏的图像,并将图像的大小从 ~4347x3260 减小到 512x512,同时保持纵横比。

我使用的批大小为 32,输入图像大小:250x250。我将训练/验证/测试数据集分成 80:10:10 的比例,其中包含 26253 张训练图像、3266 张验证图像和 3311 张测试图像。

我已经为训练图像生成器设置了种子,但我遇到的问题是,当我改变层数/架构时,模型停止在大致相同的水平上改进,大约 50% 的准确度水平和验证/训练曲线保持平坦,尽管如下图所示差距开始扩大,这是过度拟合的迹象,但它或多或少地保留了差距。

validation training accuracy 更新了验证/训练准确性 Updated Validation/training Accuracy

Validation/training loss 更新的验证/训练损失 Updated Validation/training loss

所以,我绘制了如下所示的优化学习率曲线图,它是平坦的。我选择 1e-4 作为我的基本学习率,1e-2 作为我的最大学习率,并实现了一个 CLR,如 Smith 所示。您可以在下面看到结果。现在它确实将验证准确度提高到目前 58% 左右,并且还没有开始真正过度拟合,实际上除了原始基线图之外,所有曲线都只是略微过度拟合。也许如果我减少学习率间隔,绘图会更好地查看曲线。

Loss v learning rate

但是是否值得对其进行更长时间的训练,或者有人可以建议下一步要采取的措施吗?欢迎任何建议。 我已经对图表添加了更新。

batch_size = 32
epochs = 2000    
IMG_HEIGHT = 250
IMG_WIDTH = 250
STEPS_PER_EPOCH = count_data_items(os.path.join(data_dir, 'train')) // batch_size

training_dir = 'train_test_val/train'

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=45,
    zoom_range=0.15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.15,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode="reflect"
)
print("The number of training images: ")
train_datagen = train_datagen.flow_from_directory(
  directory = training_dir,
  shuffle=True,
  target_size=(IMG_HEIGHT, IMG_WIDTH),
  batch_size=batch_size,
  seed=123,
  class_mode='categorical'
)

test_dir = 'train_test_val/test'
print("The number of test images: ")
# All images will be rescaled by 1./255
test_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = test_datagen.flow_from_directory(
    directory = test_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=batch_size,
    #seed=123,
    class_mode='categorical'
)

val_dir = 'train_test_val/val'
print("The number of validation images: ")
# All images will be rescaled by 1./255
val_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = val_datagen.flow_from_directory(
    directory = val_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=batch_size,
    #seed=123,
  class_mode='categorical'
)

clr_step_size = int(4 * STEPS_PER_EPOCH)  
base_lr = 1e-4    
max_lr = 1e-2     
mode='triangular'
# The mode is set to triangular: that's equal to linear moce. 

def get_callbacks():
    return [
        tf.keras.callbacks.TensorBoard(log_dir=run_logdir, histogram_freq=0, embeddings_freq=0, update_freq='epoch'),
        tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=21, verbose=1),
        #tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=8,verbose=1),
        tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, monitor ='val_loss', save_best_only=True, verbose=1),
        CyclicLR(base_lr=base_lr, max_lr=max_lr, step_size=clr_step_size, mode=mode)
        #tf.keras.callbacks.CSVLogger(filename=csv_logger, separator=',', append=True)
    ]

def fit_model(model, n_epochs, initial_epoch=0, batch_size=batch_size):
    print("Fitting model on training data")
    history = model.fit(
        train_datagen,
        steps_per_epoch=STEPS_PER_EPOCH,
        epochs=n_epochs,
        validation_data=val_datagen,
        validation_steps=n_val // batch_size,
        callbacks=get_callbacks(),
        verbose=1,
        initial_epoch = initial_epoch
    )



model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(64, (3,3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(128, (3,3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(256, (3,3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Conv2D(512, (3,3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])

model.compile(optimizer=optimizers.RMSprop(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

编辑:20/03/2021 应用 InceptionV3 迁移学习模型,准确度停滞在 50%。

0 个答案:

没有答案