Keras在训练和验证方面的增强

时间:2019-04-25 17:33:29

标签: python keras

我出于图像分类目的运行增强-使用Keras-如:

# Define Parameters
parameters = {"img_width" : 224,
              "img_height": 224,
              "epochs": 50,
              "batch_size" : 15}

# Define Generators  
train_datagen = ImageDataGenerator(
    rescale = 1. / 255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    validation_split = 0.06)

test_datagen = ImageDataGenerator(
    rescale=1/255)

# Define Flows from directories
train_generator = train_datagen.flow_from_directory(
    directory = train_data_dir,
    target_size=(parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode= "categorical", 
    subset = "training", 
    color_mode = "rgb",
    seed = 42)

validation_generator = train_datagen.flow_from_directory(
    directory = train_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

testing_generator = test_datagen.flow_from_dataframe(
        dataframe = testing_df, 
        x_col="path", y_col="label", 
        class_mode="raw", 
        target_size= (parameters["img_width"], parameters["img_height"]), 
        shuffle = False,
        batch_size= parameters["batch_size"])

,此代码将其输出作为训练,验证和测试的输出:找到了6911个类别的4911张图像。 找到282个属于69个类别的图像。 找到421个经过验证的图像文件名。

但是,如果我想使用test_datagen而不是train_datagen来验证数据,如下所示:

validation_generator = test_datagen.flow_from_directory(
    # Changing Line        
    directory = train_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

我得到的输出是:找到0个属于69个类别的图像。

如何解决此问题?简要地说,我想验证将在模型上有效运行的图像上的数据,因此使用仅缩放值的test_datagen。

P.s。 train_data_dir是一个文件夹,其中包含69个文件夹,其中包含来自不同类别的图像;

1 个答案:

答案 0 :(得分:0)

我认为您不应该在同一目录中进行验证和培训。

尝试指向特定的验证目录,例如:

validation_generator = test_datagen.flow_from_directory(
    # Changing Line        
    directory = validation_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

目录应类似于:

train/
    69 folders
validation/
    69 folders
test/ 
    69 folders

例如,我通常使用的设置是:

train_data_dir = (str(cwd) + r'\augmented\train\\')
validation_data_dir = (str(cwd) + r'\augmented\validation\\')

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size)

要将图像扩展到单独的目录中,您可以执行以下操作,请注意这会有些乏味,建议您从类列表中创建一个循环。对于我的示例,我只进行了二进制分类(1或0)。我拍摄了一个“原始” 0图像,并在训练,验证和测试文件夹中进行了扩充,然后再次为1图像运行脚本。您有更多的类,因此建议您循环列表。

# rescaling is disabled to allow the images to be viewed
datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

# this is a PIL image # path + filename
img = load_img(r'path_to_single_image_to_be_augmented')
# this is a Numpy array with shape (3, 150, 150)
x = img_to_array(img)
# this is a Numpy array with shape (1, 3, 150, 150)
x = x.reshape((1,) + x.shape)

# the .flow() command below generates batches of randomly transformed
# images and saves the results to save_to_dir - remember to change prefix
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\test\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 110:  # change the amount of augmented data you want here
        break  # otherwise the generator would loop indefinitely

i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\test\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 280:  
        break  

i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\validation\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 280: 
        break