我很难理解如何使用 tensorflow 实现数据增强。我有一个数据集(图像),分为两个子集;培训和测试。在我使用各种参数调用 ImageDataGenerator
函数后,我是否需要保存图像(如使用 flow()
)或者 Tensorflow 会在模型训练时扩充我的数据吗?
这是我实现的代码:
# necessary imports
train_datagen = ImageDataGenerator(
rescale=1. / 255,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
brightness_range=(0.3, 1.0),
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest',
validation_split=0.2
)
training_directory = '/tmp/dataset/training'
testing_directory = '/tmp/dataset/testing'
training_set = train_datagen.flow_from_directory(
training_directory,
target_size=(150, 150),
batch_size=32,
class_mode='binary',
subset='training'
)
test_set = train_datagen.flow_from_directory(
testing_directory,
target_size=(150, 150),
batch_size=32,
class_mode='binary',
subset='validation'
)
# creating a sequential model
...
# fitting and data plotting
模型总结:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 72, 72, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 34, 34, 128) 73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128) 0
_________________________________________________________________
dropout (Dropout) (None, 17, 17, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 36992) 0
_________________________________________________________________
dense (Dense) (None, 512) 18940416
_________________________________________________________________
dense_1 (Dense) (None, 1) 513
=================================================================
Total params: 19,034,177
Trainable params: 19,034,177
Non-trainable params: 0
_________________________________________________________________
答案 0 :(得分:2)
您不必保存新数据。
在调用 flow 方法时,数据会即时扩充并作为模型的输入。
因此,数据是实时生成的,并立即输入到您的模型中。
答案 1 :(得分:2)
您不需要保存数据。使用训练和测试数据生成器将增强数据(训练/测试)直接输入模型进行训练或评估步骤。
这是使用创建的数据生成器 train_generator
和 test_generator
更新所有步骤的代码。
datagenerator = ImageDataGenerator(
rescale=1. / 255,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
brightness_range=(0.3, 1.0),
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest',
validation_split=0.2
)
training_directory = '/tmp/dataset/training'
testing_directory = '/tmp/dataset/testing'
train_generator = datagenerator.flow_from_directory(
training_directory,
target_size=(150, 150),
batch_size=32,
class_mode='binary',
subset='training'
)
test_generator = datagenerator.flow_from_directory(
testing_directory,
target_size=(150, 150),
batch_size=32,
class_mode='binary',
subset='validation'
)
# Build and compile the model
....
# Get the number of steps per epoch for each of the data generators
train_steps_per_epoch = train_generator.n // train_generator.batch_size
test_steps_per_epoch = test_generator.n // test_generator.batch_size
# Fit the model
model.fit_generator(train_generator, steps_per_epoch=train_steps_per_epoch, epochs=your_nepochs)
# Evaluate the model
model.evaluate_generator(test_generator, steps=test_steps_per_epoch)