Keras 神经网络中 val-acc 和训练精度之间的巨大差异

时间:2021-06-16 04:03:23

标签: python tensorflow keras neural-network

我在训练时发现验证数据的收敛速度明显慢于训练数据,所以我用训练数据替换了验证数据,验证损失val_loss仍然明显慢于训练损失,这是为什么?有人可以回答我的问题吗? img

曲线如图所示。

img 代码:

import tensorflow as tf
from tensorflow.keras import applications
from tensorflow.keras.callbacks import LearningRateScheduler
import tensorflow.keras.backend as K
base_model = applications.MobileNetV2(input_shape=(224,224,3),weights='imagenet',include_top=False)

inputs=tf.keras.layers.Input(shape=(224,224,3))
x=base_model(inputs)#此处x为MobileNetV2模型去处顶层时输出的特征相应图。
x=tf.keras.layers.GlobalAveragePooling2D()(x)
outputs=tf.keras.layers.Dense(133,activation='softmax')(x)
model=tf.keras.models.Model(inputs=inputs,outputs=outputs)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001,momentum=0.9)
import os
model.compile(optimizer=optimizer,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'],)

checkpoint_save_path = "./checkpoint/mobilenetv2.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)
    
def scheduler(epoch):
    # 每隔10个epoch,学习率减小为原来的1/10
    if epoch % 10 == 0 and epoch != 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr * 0.95)
        print("lr changed to {}".format(lr * 0.95))
    return K.get_value(model.optimizer.lr)
 
reduce_lr = LearningRateScheduler(scheduler)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True)

history = model.fit(train_x,train_y, batch_size=4, epochs=20, validation_data=(train_x,train_y), validation_freq=1,
                    callbacks=[cp_callback,reduce_lr])
# history = model.fit(train_data, batch_size=4, epochs=20, validation_data=val_data, validation_freq=1,
#                     callbacks=[cp_callback,reduce_lr])
Epoch 1/20
  2/215 [..............................] - ETA: 5s - loss: 5.1054 - sparse_categorical_accuracy: 0.0000e+00WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0130s vs `on_train_batch_end` time: 0.0379s). Check your callbacks.
215/215 [==============================] - 12s 58ms/step - loss: 4.4875 - sparse_categorical_accuracy: 0.0291 - val_loss: 4.8017 - val_sparse_categorical_accuracy: 0.0105
Epoch 2/20
215/215 [==============================] - 12s 54ms/step - loss: 3.7271 - sparse_categorical_accuracy: 0.0698 - val_loss: 5.4774 - val_sparse_categorical_accuracy: 0.0081
Epoch 3/20
215/215 [==============================] - 11s 53ms/step - loss: 2.9440 - sparse_categorical_accuracy: 0.1802 - val_loss: 4.9705 - val_sparse_categorical_accuracy: 0.0291
Epoch 4/20
215/215 [==============================] - 12s 55ms/step - loss: 2.3213 - sparse_categorical_accuracy: 0.3163 - val_loss: 4.1974 - val_sparse_categorical_accuracy: 0.0558
Epoch 5/20
215/215 [==============================] - 12s 54ms/step - loss: 1.6388 - sparse_categorical_accuracy: 0.5163 - val_loss: 4.2031 - val_sparse_categorical_accuracy: 0.0395
Epoch 6/20
215/215 [==============================] - 12s 54ms/step - loss: 1.0936 - sparse_categorical_accuracy: 0.7035 - val_loss: 3.8308 - val_sparse_categorical_accuracy: 0.0895
Epoch 7/20
215/215 [==============================] - 12s 54ms/step - loss: 0.6046 - sparse_categorical_accuracy: 0.8651 - val_loss: 3.6757 - val_sparse_categorical_accuracy: 0.1407
Epoch 8/20
215/215 [==============================] - 12s 56ms/step - loss: 0.4002 - sparse_categorical_accuracy: 0.9186 - val_loss: 3.3228 - val_sparse_categorical_accuracy: 0.2384
Epoch 9/20
215/215 [==============================] - 12s 56ms/step - loss: 0.2533 - sparse_categorical_accuracy: 0.9593 - val_loss: 3.1674 - val_sparse_categorical_accuracy: 0.2547
Epoch 10/20
215/215 [==============================] - 12s 55ms/step - loss: 0.1698 - sparse_categorical_accuracy: 0.9767 - val_loss: 2.4485 - val_sparse_categorical_accuracy: 0.3686
lr changed to 0.0009500000451225787
Epoch 11/20
215/215 [==============================] - 12s 55ms/step - loss: 0.1416 - sparse_categorical_accuracy: 0.9826 - val_loss: 1.2818 - val_sparse_categorical_accuracy: 0.6721
Epoch 12/20
215/215 [==============================] - 12s 55ms/step - loss: 0.1119 - sparse_categorical_accuracy: 0.9849 - val_loss: 1.1645 - val_sparse_categorical_accuracy: 0.7163
Epoch 13/20
215/215 [==============================] - 12s 55ms/step - loss: 0.0790 - sparse_categorical_accuracy: 0.9965 - val_loss: 0.2560 - val_sparse_categorical_accuracy: 0.9477
Epoch 14/20
215/215 [==============================] - 12s 57ms/step - loss: 0.0718 - sparse_categorical_accuracy: 0.9965 - val_loss: 0.1067 - val_sparse_categorical_accuracy: 0.9814
Epoch 15/20
215/215 [==============================] - 12s 55ms/step - loss: 0.0641 - sparse_categorical_accuracy: 0.9953 - val_loss: 0.1339 - val_sparse_categorical_accuracy: 0.9651
Epoch 16/20
215/215 [==============================] - 12s 54ms/step - loss: 0.0440 - sparse_categorical_accuracy: 0.9977 - val_loss: 0.0210 - val_sparse_categorical_accuracy: 1.0000
Epoch 17/20
215/215 [==============================] - 12s 56ms/step - loss: 0.0322 - sparse_categorical_accuracy: 0.9988 - val_loss: 0.0040 - val_sparse_categorical_accuracy: 1.0000
Epoch 18/20
215/215 [==============================] - 12s 56ms/step - loss: 0.0351 - sparse_categorical_accuracy: 0.9977 - val_loss: 0.0020 - val_sparse_categorical_accuracy: 1.0000
Epoch 19/20
215/215 [==============================] - 12s 54ms/step - loss: 0.0299 - sparse_categorical_accuracy: 0.9988 - val_loss: 0.0091 - val_sparse_categorical_accuracy: 0.9988
Epoch 20/20
215/215 [==============================] - 12s 54ms/step - loss: 0.0402 - sparse_categorical_accuracy: 0.9977 - val_loss: 0.0292 - val_sparse_categorical_accuracy: 0.9965

1 个答案:

答案 0 :(得分:0)

您的图像显示了初始时期的损失值。我认为这里没有太大的错误,最初的训练损失应该低于验证损失。随着时间的推移,验证损失应该变得更低,并且应该在训练损失附近。所以你应该继续训练,直到验证损失变低。

一些需要跟踪的指针(取自here):

  • 通常验证损失应该类似于但略高 而不是训练损失。
  • 只要验证损失低于甚至 等于训练损失,人们应该继续进行更多训练。
  • 如果训练损失减少而验证损失没有增加,那么 再次继续做更多的训练 如果验证损失开始增加 那么是时候停下来了
相关问题