我正在尝试解决CNN模型中的问题。我正在使用结构如下的数据集: 我的数据组织如下:
我的数据集太大,我正在使用ImageDataGenerator预处理图像,还分批上传它们(降低了计算成本)。首先,我将ImageDataGenerator配置如下:
from keras.preprocessing.image import ImageDataGenerator
#Define a ImageDataGenerator for each dataset.
#This augmentation process is only to rescale each imagem to 1/255
datagen_train = ImageDataGenerator(rescale=1./255) #rescale=1./255
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
#Define a batch_size parameter
batch_size=32
# Here .flow_from_directory is used to transform
train_generator = datagen_train.flow_from_directory(
'content/cell_images/train', #Train folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical') # We use categorical_crossentropy loss,
# we need categorical labels
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical')
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=32,
class_mode='categorical')
为拟合模型,使用了fit_generator和一个checkpointer来基于validation_accuracy保存最佳权重:
from keras.callbacks import ModelCheckpoint
# Define epochs number
epochs = 10
# Create a checkpointer to save only the best params
checkpointer = ModelCheckpoint(filepath='cnn_model.weights.best.hdf5',
verbose=1, save_best_only=True)
model.fit_generator(train_generator,
steps_per_epoch=train_generator.samples//batch_size,
epochs=epochs,
callbacks=[checkpointer],
validation_data=valid_generator,
validation_steps=valid_generator.samples//batch_size)
最后,将最佳权重加载到模型中。使用test_set对模型进行了评估:
# load the weights that yielded the best validation accuracy
model.load_weights('cnn_model.weights.best.hdf5')
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples//batch_size)
print('\n', 'Test accuracy:', score[1])
但是,这是我的问题:每次我只运行model.evaluate_generator
而不再次训练模型(即保持相同的权重)时,它会返回不同的准确性得分。
我一直在寻找解决方案,阅读了大量文章以获取一些见解,最近我也有所进步。
最近,我基于this post发现,如果我在test_generator中设置了Shuffle=True
和batch_size=1
:
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=1,
class_mode='categorical',
shuffle=False)`
和steps = test_generator.samples in test_generator
:
score = model.evaluate_generator(test_generator, test_generator.samples)
值不再更改。
我正在研究基于this post的重新缩放1./255的效果。为此,我使用了带有检查指针的回调,仅将权重保存为最佳验证socore。之后,如上所述,我将最佳权重加载到模型中,并使用model.evaluate_generator进行了评估。为了检查分数的一致性,我还使用验证分数来检查由回调函数返回的最佳权值是否与valuate_generator返回的值相同。在运行带有validation_set的evaluate_generator之前,我使用了测试集的相同参数:
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=1,
class_mode='categorical',
shuffle=False)
#evaluate and print test accuracy
score = model.evaluate_generator(valid_generator,
valid_generator.samples)
print('\n', 'Valid accuracy:', score[1])
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples)
print('\n', 'Test accuracy:', score[1])
奇怪的是,我注意到了
当我不使用调整比例(1./255)时:
datagen_train = ImageDataGenerator()
datagen_test = ImageDataGenerator()
datagen_valid = ImageDataGenerator()
回调显示的validation_score(0.5)与从model.evaluate_generator(0.5)获得的完全相同;此外,测试集返回的准确度得分= 0.5。
当我使用调整比例(1./255)时:
datagen_train = ImageDataGenerator(rescale=1./255)
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
回调显示的validate_score之间的差异(0.9515):
Epoch 7/10
688/688 [==============================] - 67s 97ms/step - loss:
0.2017 - acc: 0.9496 - val_loss: 0.1767 - val_acc: 0.9515
Epoch 00007: val_loss improved from 0.19304 to 0.17671, saving model
to cnn_model.weights.best.hdf5
和从model.evaluate_generator(Valid accuracy: 0.9466618287373004
)获得的分数非常小;使用测试集-
Test accuracy: 0.9078374455732946
基于验证得分之间的微小差异,我是否可以推断出validate_generator正常工作?而且,我可以推断出test_set上的准确性得分也正确吗?还是有另一种方法可以解决这个问题?
我对这个问题感到沮丧。 抱歉,很长的帖子,我正在尝试成为我能做的更多的教学。
谢谢!