如何在Keras的datagen.flow中加入验证数据?

时间:2018-02-16 04:28:57

标签: python machine-learning deep-learning keras

这是我在之前的post中遇到的问题的扩展。

我在Keras中应用以下代码来进行数据扩充(我暂时不想使用model.fit_generator,所以我使用datagen.flow手动循环它。)

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)


# alternative to model.fit_generator
for e in range(epochs):
    print('Epoch', e)
    batches = 0
    for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
        model.fit(x_batch, y_batch)
        batches += 1
        if batches >= len(x_train) / 32:
            # we need to break the loop by hand because
            # the generator loops indefinitely
            break

我想将验证数据合并到我正在运行的model.fit循环中。例如,我想在for循环中将model.fit(X_batch,y_batch)替换为与model.fit(X_batch,y_batch, validation_data=(x_val, y_val))类似的内容。

我对如何在for循环中使用datagen.flow合并此验证组件感到困惑。欢迎任何有关我应该如何进行的见解。

2 个答案:

答案 0 :(得分:2)

即使帖子有几个月的时间,我认为也可能有用:从版本2.1.5 of Keras起,可以将validation_split参数传递给构造函数,然后在使用flow和flow_from_directory方法时选择子集。 / p>

例如:

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    validation_split=0.2)
datagen.fit(x_train)
model = ...
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size, subset='training'),
                    steps_per_epoch=steps_per_epoch,
                    epochs=epochs,
                    validation_data=datagen.flow(x_train, y_train, batch_size=batch_size, subset='validation'),
                    validation_steps=validation_steps)

答案 1 :(得分:1)

我假设您已将数据拆分为培训和验证集。如果没有,您将不得不这样做以获得以下建议。

您可以使用验证数据创建第二个数据生成器,然后只需iterate over this generator at the same time作为训练数据生成器。我在下面的代码中作为评论提供了进一步的帮助。

这是你的代码,为了做到这一点而改变了,但也许你会想要改变一些事情:

# unchanged from your code
tr_datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# create new generator for validation
val_datagen = ImageDataGenerator()    # don't perform augmentation on validation data


# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)

tr_datagen.fit(x_train)    # can leave this out if not standardising or whitening 
val_datagen.fit(x_val)     # can leave this out if not standardising or whitening

# alternative to model.fit_generator
for e in range(epochs):
    print('Epoch', e)
    batches = 0

    # combine both generators, in python 3 using zip()
    for (x_batch, y_batch), (val_x, val_y) in zip(
                                 tr_datagen.flow(x_train, y_train, batch_size=32),
                                 val_datagen.flow(x_val, y_val, batch_size=32)):
        model.fit(x_batch, y_batch, validation_Data=(val_x, val_y))
        batches += 1
        if batches >= len(x_train) / 32:
            # we need to break the loop by hand because
            # the generator loops indefinitely
            break