我正在与keras一起在预先训练的网络上应用转移学习。我有带有二进制类别标签的图像补丁,并且想使用CNN预测范围为[0; 1]用于看不见的图像补丁。
设置:32个批次,转换大小层:16
结果:经过几个纪元,我的准确度几乎为1,而损失接近0,而在验证数据中,准确度仍为0.5,每个纪元的损失有所不同。最后,CNN会针对所有看不见的补丁预测仅一个类别。
以下策略可以减少过度拟合:
我尝试使用最大为512的批处理大小,并且更改了全连接层的大小,但没有成功。在仅对其余部分进行随机测试之前,我想问一下如何调查出问题的原因,以便找出以上哪种策略最具潜力。
在我的代码下面:
def generate_data(imagePathTraining, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory\
(directory=imagePathTraining, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True) # whether to shuffle the data
return generator
def create_model(imagesize, nBands, nClasses):
print("%s: Creating the model..." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
# Create pre-trained base model
basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
print("%s: Base model created with %i layers and %i parameters." %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
len(basemodel.layers),
basemodel.count_params()))
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(16, activation='relu')(x) # fully-connected layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
print("%s: New model created with %i layers and %i parameters." %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
len(model.layers),
model.count_params()))
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Define learning optimizer
optimizerSGD = optimizers.SGD(lr=0.01, # learning rate.
momentum=0.0, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=0.0, # learning rate decay over each update
nesterov=False) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
print("%s: Model compiled." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return model
def train_model(model, nBatches, nEpochs, imagePathTraining, imagesize, nSamples, valX,valY, resultPath):
history = model.fit_generator(generator=generate_data(imagePathTraining, imagesize, nBatches),
steps_per_epoch=nSamples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=(valX,valY), # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=32, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=True, # whether to use process-based threading
shuffle=True, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return history
答案 0 :(得分:0)
这些结果似乎太糟糕了,无法解决过度拟合的问题。相反,我怀疑用于训练和验证的数据存在差异。
我注意到,对于训练数据,您正在使用ImageDataGenerator(rescale=1./255)
,但是对于valX
,我看不到任何此类处理。我建议对验证数据也使用具有相同缩放比例配置的单独ImageDataGenerator。这样差异就尽可能小。
答案 1 :(得分:0)
根据上述建议,我进行了以下修改:
ImageDataGenerator
相同) def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
generator = datagen.flow_from_directory(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True, # whether to shuffle the data
seed=42) # random seed for shuffling and transformations
return generator
def create_model(imagesize, nBands, nClasses):
# Create pre-trained base model
basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(1024, activation='relu')(x) # fully-connected layer
x = Dropout(rate=0.8)(x) # dropout layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
# Define learning optimizer
optimizerSGD = optimizers.SGD(lr=0.001, # learning rate.
momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=learningRate/nEpochs, # learning rate decay over each update
nesterov=True) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
return model
def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath):
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples // nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples // nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=1, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=False, # whether to use process-based threading
shuffle=True, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
return history, model
通过这些修改,在训练了100个时期后,我达到了以下指标(批次大小为32):
train_acc
:0.831 train_loss
:0.436 val_acc
:0.692 val_loss
:0.568 由于以下原因,我认为这些设置是最佳的:
train_acc
仅在30个纪元后才超过val_acc
train_acc
和val_acc
之间的微小差异)train_loss
和val_loss
不断减少但是,我想知道:
val_acc
为代价,则以过度拟合为代价sklearn.metrics classification_report()
预测中用predict_generator()
导出的 f1-score , precision 和 recall 都在0.5左右,表示没有学习2级分类的方法。
也许,我最好在这些问题上提出一个新问题。
答案 2 :(得分:0)
对于您为减少ResNet50的过拟合而不是将经过预先训练的CNN本身更改为VGG所实施的工作,我表示赞赏。