我已将ResNet50用于5个类别的图像分类。训练损失随着时间的推移而减少,并且训练精度也提高了,但是validation_accuracy保持停滞,并且validation_loss保持在某个较高的值上徘徊。下面的代码在这里我做错什么了吗?
我尝试使用不同的学习率。尝试添加BatchNorm和Dropout层。还要确保我传递的数据是干净的并且结构正确。指定批次大小,还使用shuffle = True进行验证,但似乎无济于事。在这方面的任何帮助将非常有价值。
from tensorflow.python.keras.applications import ResNet50
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, GlobalAveragePooling2D, BatchNormalization, Dropout
from tensorflow.python.keras.applications.resnet50 import preprocess_input
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras.preprocessing.image import load_img, img_to_array
from tensorflow.python.keras import callbacks
import time
from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.python.keras.models import Model
from tensorflow.python.keras import optimizers
resnet_weights_path = './clean_resnet_data/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'
data_generator = ImageDataGenerator(horizontal_flip=True,
vertical_flip=True,
zoom_range=0.3,
rescale=1. / 255
)
validation_datagen = ImageDataGenerator(rescale=1. / 255)
image_size = 512
batch_size = 16
train_generator = data_generator.flow_from_directory(
'./org_train',
target_size=(image_size, image_size),
#batch_size=batch_size,
class_mode='categorical')
validation_generator = validation_datagen.flow_from_directory(
'./valid_org',
target_size=(image_size, image_size),
batch_size=batch_size,
shuffle=True,
class_mode='categorical')
num_classes = len(train_generator.class_indices)
Found 1204 images belonging to 5 classes.
Found 250 images belonging to 5 classes.
5
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(num_classes, activation='sigmoid'))
model.layers[0].trainable = False
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 2048) 23587712
_________________________________________________________________
flatten (Flatten) (None, 2048) 0
_________________________________________________________________
dense (Dense) (None, 512) 1049088
_________________________________________________________________
batch_normalization (BatchNo (None, 512) 2048
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 131328
_________________________________________________________________
batch_normalization_1 (Batch (None, 256) 1024
_________________________________________________________________
dense_2 (Dense) (None, 5) 1285
=================================================================
Total params: 24,772,485
Trainable params: 1,183,237
Non-trainable params: 23,589,248
_________________________________________________________________
count = sum([len(files) for r, d, files in os.walk("./org_train/")])
steps_in_each_epoch=int(count/batch_size) + 1
#setting callback parameters for model saving, early stopping and reduce_lr
model_checkpoint = ModelCheckpoint('resnet50_clean_data.model',monitor='f1',
mode = 'max', save_best_only=True, verbose=2)
log_dir = './tf-log/newdata_withlr_nodrop'
tb_cb = callbacks.TensorBoard(log_dir=log_dir, histogram_freq=0)
early_stopping = EarlyStopping(monitor='val_f1', mode = 'max',patience=15, verbose=2)
reduce_lr = ReduceLROnPlateau(monitor='val_f1', mode = 'max',factor=0.5, patience=3, min_lr=0.00001, verbose=2)
cbks = [early_stopping,reduce_lr]
from tensorflow.python.keras import optimizers
# adam
adam = optimizers.Adam(lr = 0.003)
# compile
model.compile(loss='categorical_crossentropy',optimizer=adam,metrics=["accuracy",f1])
model.fit_generator(train_generator,epochs=150,validation_data=validation_generator,callbacks=cbks)
Output:
Epoch 1/150
38/38 [==============================] - 86s 2s/step - loss: 1.5320 - acc: 0.3364 - f1: 0.3989 - val_loss: 5.5517 - val_acc: 0.2000 - val_f1: 0.2000
Epoch 2/150
38/38 [==============================] - 70s 2s/step - loss: 1.2850 - acc: 0.4567 - f1: 0.4663 - val_loss: 4.4173 - val_acc: 0.2000 - val_f1: 0.2100
Epoch 3/150
38/38 [==============================] - 73s 2s/step - loss: 1.2396 - acc: 0.4583 - f1: 0.4716 - val_loss: 4.7810 - val_acc: 0.2000 - val_f1: 0.2658
Epoch 4/150
38/38 [==============================] - 73s 2s/step - loss: 1.2147 - acc: 0.4973 - f1: 0.4902 - val_loss: 4.2491 - val_acc: 0.2000 - val_f1: 0.2667
Epoch 5/150
38/38 [==============================] - 73s 2s/step - loss: 1.1994 - acc: 0.5082 - f1: 0.4982 - val_loss: 3.5541 - val_acc: 0.2000 - val_f1: 0.2667
Epoch 6/150
38/38 [==============================] - 73s 2s/step - loss: 1.1525 - acc: 0.5284 - f1: 0.5116 - val_loss: 3.8147 - val_acc: 0.2000 - val_f1: 0.2667
Epoch 7/150
38/38 [==============================] - 73s 2s/step - loss: 1.1658 - acc: 0.5014 - f1: 0.5104 - val_loss: 3.4530 - val_acc: 0.1920 - val_f1: 0.2784
Epoch 8/150
38/38 [==============================] - 77s 2s/step - loss: 1.1181 - acc: 0.5222 - f1: 0.5137 - val_loss: 3.1350 - val_acc: 0.2000 - val_f1: 0.2083
Epoch 9/150
38/38 [==============================] - 73s 2s/step - loss: 1.0760 - acc: 0.5543 - f1: 0.5402 - val_loss: 2.5362 - val_acc: 0.2000 - val_f1: 0.2000
Epoch 10/150
37/38 [============================>.] - ETA: 1s - loss: 1.1092 - acc: 0.5331 - f1: 0.5335
Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.001500000013038516.
38/38 [==============================] - 73s 2s/step - loss: 1.1155 - acc: 0.5322 - f1: 0.5329 - val_loss: 2.5705 - val_acc: 0.2000 - val_f1: 0.1993
Epoch 11/150
38/38 [==============================] - 72s 2s/step - loss: 1.0541 - acc: 0.5623 - f1: 0.5438 - val_loss: 2.2404 - val_acc: 0.2000 - val_f1: 0.2180
Epoch 12/150
38/38 [==============================] - 73s 2s/step - loss: 1.0180 - acc: 0.5630 - f1: 0.5421 - val_loss: 2.0331 - val_acc: 0.2480 - val_f1: 0.2733
Epoch 13/150
37/38 [============================>.] - ETA: 1s - loss: 1.0070 - acc: 0.5992 - f1: 0.5508
Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.000750000006519258.
38/38 [==============================] - 73s 2s/step - loss: 1.0071 - acc: 0.6006 - f1: 0.5513 - val_loss: 1.9989 - val_acc: 0.2000 - val_f1: 0.2037
Epoch 14/150
38/38 [==============================] - 76s 2s/step - loss: 1.0195 - acc: 0.5523 - f1: 0.5082 - val_loss: 2.0054 - val_acc: 0.2000 - val_f1: 0.2001
Epoch 15/150
38/38 [==============================] - 73s 2s/step - loss: 1.0127 - acc: 0.5982 - f1: 0.5079 - val_loss: 2.1047 - val_acc: 0.2040 - val_f1: 0.2143
Epoch 16/150
37/38 [============================>.] - ETA: 1s - loss: 0.9787 - acc: 0.5934 - f1: 0.5173
Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.000375000003259629.
38/38 [==============================] - 73s 2s/step - loss: 0.9776 - acc: 0.5918 - f1: 0.5183 - val_loss: 2.2156 - val_acc: 0.2000 - val_f1: 0.2705
Epoch 17/150
38/38 [==============================] - 73s 2s/step - loss: 0.9950 - acc: 0.5956 - f1: 0.4904 - val_loss: 2.2039 - val_acc: 0.2000 - val_f1: 0.2622
Epoch 18/150
38/38 [==============================] - 73s 2s/step - loss: 0.9451 - acc: 0.6174 - f1: 0.5297 - val_loss: 2.2094 - val_acc: 0.1960 - val_f1: 0.2689
Epoch 19/150
37/38 [============================>.] - ETA: 1s - loss: 0.9531 - acc: 0.6182 - f1: 0.5106
Epoch 00019: ReduceLROnPlateau reducing learning rate to 0.0001875000016298145.
38/38 [==============================] - 78s 2s/step - loss: 0.9566 - acc: 0.6168 - f1: 0.5065 - val_loss: 2.1872 - val_acc: 0.2080 - val_f1: 0.2652
Epoch 20/150
38/38 [==============================] - 73s 2s/step - loss: 0.9706 - acc: 0.5898 - f1: 0.4943 - val_loss: 2.1983 - val_acc: 0.2080 - val_f1: 0.2658
Epoch 21/150
38/38 [==============================] - 73s 2s/step - loss: 0.9365 - acc: 0.5958 - f1: 0.4986 - val_loss: 2.1936 - val_acc: 0.1960 - val_f1: 0.2416
Epoch 22/150
37/38 [============================>.] - ETA: 1s - loss: 0.9472 - acc: 0.6068 - f1: 0.4861
Epoch 00022: ReduceLROnPlateau reducing learning rate to 9.375000081490725e-05.
38/38 [==============================] - 73s 2s/step - loss: 0.9457 - acc: 0.6097 - f1: 0.4884 - val_loss: 2.1958 - val_acc: 0.1960 - val_f1: 0.2428
Epoch 00022: early stopping
验证集的行为令人困惑。我不明白为什么自第一个纪元(保持不变)以来,验证准确性或验证f1根本没有变化。可能是什么问题?