验证损失低于训练损失,并且Keras的损失减少了

时间:2020-05-23 13:40:13

标签: python tensorflow machine-learning keras deep-learning

我正在训练TACO数据集上的unet模型,我的输出出现问题。我的验证损失比我的训练损失低很多,而且我不确定这是否是一件好事。 由于TACO数据集是包含1500张图像的COCO格式数据集,因此我通过让train_generator包含图像0-1199和val_generator包含图像1200-1499来拆分数据。然后,我使用以下功能扩充数据:

def augmentationsGenerator(gen, augGeneratorArgs, seed=None):
    # Initialize the image data generator with args provided
    image_gen = ImageDataGenerator(**augGeneratorArgs)

    # Remove the brightness argument for the mask. Spatial arguments similar to image.
    augGeneratorArgs_mask = augGeneratorArgs.copy()
    _ = augGeneratorArgs_mask.pop('brightness_range', None)
    # Initialize the mask data generator with modified args
    mask_gen = ImageDataGenerator(**augGeneratorArgs_mask)

    np.random.seed(seed if seed is not None else np.random.choice(range(9999)))

    for img, mask in gen:
        seed = np.random.choice(range(9999))
        # keep the seeds syncronized otherwise the augmentation of the images 
        # will end up different from the augmentation of the masks
        g_x = image_gen.flow(255*img, 
                             batch_size = img.shape[0], 
                             seed = seed, 
                             shuffle=True)
        g_y = mask_gen.flow(mask, 
                             batch_size = mask.shape[0], 
                             seed = seed, 
                             shuffle=True)

        img_aug = next(g_x)/255.0

        mask_aug = next(g_y)


        yield img_aug, mask_aug

具有以下参数:

augGeneratorArgs = dict(featurewise_center = False, 
                        samplewise_center = False,
                        rotation_range = 5, 
                        width_shift_range = 0.01, 
                        height_shift_range = 0.01, 
                        brightness_range = (0.8,1.2),
                        shear_range = 0.01,
                        zoom_range = [1, 1.25],  
                        horizontal_flip = True, 
                        vertical_flip = False,
                        fill_mode = 'reflect',
                        data_format = 'channels_last')

我的模型代码是:

IMG_WIDTH = 224
IMG_HEIGHT = 224
IMG_CHANNELS = 3
epochs = 25
validation_steps = val_size
steps_per_epoch = train_size
x = 32

##Creating the model

initializer = "he_normal"

###Building U-Net Model

##Input Layer
inputs = Input((IMG_WIDTH, IMG_HEIGHT, IMG_CHANNELS))

##Converting inputs to float
s = tf.keras.layers.Lambda(lambda x: x / 255)(inputs)

##Contraction
c1 = tf.keras.layers.Conv2D(x, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(s)
c1 = tf.keras.layers.Dropout(0.1)(c1)
c1 = tf.keras.layers.Conv2D(x, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c1)
p1 = tf.keras.layers.MaxPooling2D((2,2))(c1)

c2 = tf.keras.layers.Conv2D(x*2, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(p1)
c2 = tf.keras.layers.Dropout(0.1)(c2)
c2 = tf.keras.layers.Conv2D(x*2, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c2)
p2 = tf.keras.layers.MaxPooling2D((2,2))(c2)

c3 = tf.keras.layers.Conv2D(x*4, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(p2)
c3 = tf.keras.layers.Dropout(0.2)(c3)
c3 = tf.keras.layers.Conv2D(x*4, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c3)
p3 = tf.keras.layers.MaxPooling2D((2,2))(c3)

c4 = tf.keras.layers.Conv2D(x*8, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(p3)
c4 = tf.keras.layers.Dropout(0.2)(c4)
c4 = tf.keras.layers.Conv2D(x*8, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c4)
p4 = tf.keras.layers.MaxPooling2D((2,2))(c4)

c5 = tf.keras.layers.Conv2D(x*16, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(p4)
c5 = tf.keras.layers.Dropout(0.3)(c5)
c5 = tf.keras.layers.Conv2D(x*16, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c5)

##Expansion
u6 = tf.keras.layers.Conv2DTranspose(x*8, (2,2), strides=(2,2), padding="same")(c5)
u6 = tf.keras.layers.concatenate([u6, c4])
c6 = tf.keras.layers.Conv2D(x*8, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(u6)
c6 = tf.keras.layers.Dropout(0.2)(c6)
c6 = tf.keras.layers.Conv2D(x*8, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c6)

u7 = tf.keras.layers.Conv2DTranspose(x*4, (2,2), strides=(2,2), padding="same")(c6)
u7 = tf.keras.layers.concatenate([u7, c3])
c7 = tf.keras.layers.Conv2D(x*4, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(u7)
c7 = tf.keras.layers.Dropout(0.2)(c7)
c7 = tf.keras.layers.Conv2D(x*4, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c7)

u8 = tf.keras.layers.Conv2DTranspose(x*2, (2,2), strides=(2,2), padding="same")(c7)
u8 = tf.keras.layers.concatenate([u8, c2])
c8 = tf.keras.layers.Conv2D(x*2, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(u8)
c8 = tf.keras.layers.Dropout(0.1)(c8)
c8 = tf.keras.layers.Conv2D(x*2, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c8)

u9 = tf.keras.layers.Conv2DTranspose(x, (2,2), strides=(2,2), padding="same")(c8)
u9 = tf.keras.layers.concatenate([u9, c1], axis=3)
c9 = tf.keras.layers.Conv2D(x, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(u9)
c9 = tf.keras.layers.Dropout(0.1)(c9)
c9 = tf.keras.layers.Conv2D(x, (3,3), activation="relu", kernel_initializer=initializer, padding="same")(c9)

##Output Layer
outputs = tf.keras.layers.Conv2D(61, (1,1), activation="softmax")(c9)

##Defining Model
model = tf.keras.Model(inputs=[inputs], outputs=[outputs])

##Compiling Model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])


##Defining callbacks
callbacks = [
             tf.keras.callbacks.ModelCheckpoint('/content/drive/My Drive/THESIS/taco_2-2_final_retry.h5', verbose=1, save_best_only=True),
             tf.keras.callbacks.EarlyStopping(patience=6, monitor="val_loss"),
             tf.keras.callbacks.TensorBoard(log_dir=logs),
             tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience = 2, min_lr = 0.001)]

##Training the model
results = model.fit(x = train_gen_aug, 
                    validation_data = val_gen_aug, 
                    steps_per_epoch = steps_per_epoch, 
                    validation_steps = validation_steps, 
                    epochs = epochs, 
                    callbacks=callbacks,
                    verbose = True)

前几个时期产生以下结果:

Epoch 1/25
1200/1200 [==============================] - ETA: 0s - loss: 0.8035 - sparse_categorical_accuracy: 0.9495
Epoch 00001: val_loss improved from inf to 0.22116, saving model to /content/drive/My Drive/THESIS/taco_2-2_final_retry.h5
1200/1200 [==============================] - 7408s 6s/step - loss: 0.8035 - sparse_categorical_accuracy: 0.9495 - val_loss: 0.2212 - val_sparse_categorical_accuracy: 0.9859 - lr: 0.0010
Epoch 2/25
1200/1200 [==============================] - ETA: 0s - loss: 0.7942 - sparse_categorical_accuracy: 0.9501
Epoch 00002: val_loss improved from 0.22116 to 0.21732, saving model to /content/drive/My Drive/THESIS/taco_2-2_final_retry.h5
1200/1200 [==============================] - 6378s 5s/step - loss: 0.7942 - sparse_categorical_accuracy: 0.9501 - val_loss: 0.2173 - val_sparse_categorical_accuracy: 0.9861 - lr: 0.0010

所以我的问题是,验证损失是否比我的训练损失还低/可以接受?以及如何进一步减少验证损失?我的目标是约0.0倍。是否添加更多的辍学层或增加辍学值?减少/增加每层神经元的数量?

1 个答案:

答案 0 :(得分:2)

几乎总是训练损失都低于验证损失,所以还可以。

关于减少Val损失,您必须解决各种问题。 如:

1)更改augGeneratorArgs超参数

2)每层添加更多的层或神经元

3)添加更多辍学对象以减少过度拟合。

4)增加/减少时期

5)绘制可视化的火车/ Val损失图,以检查模型是否过拟合。