Question

我有以下代码使用列表中的数据作为输入进行数据扩充：

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img import PIL

def augment(file_images_path, dir_save):

    datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')

    with open(file_images_path) as f:

       images_names = f.readlines()
       images_names = [x.strip() for x in images_names]
       for line in images_names:
           img=PIL.Image.open(line)             
           img=img.resize((28,28))                        
           x = img_to_array(img)                                                    
           x = x.reshape((1,) + x.shape)        
           # the .flow() command below generates batches of randomly transformed 
           #images and saves the results to the `dir_save` directory            
           i = 0            
           for batch in datagen.flow(x, batch_size=1, save_to_dir=dir_save, save_prefix='augmented', save_format='tif'):
                i += 1
                if i > 2:
                    break  # otherwise the generator would loop indefinitely

我在Keras的数据增强方面相当新秀，我想知道Keras每次迭代对我的图像执行了多少次图像操作。例如，如果我在包含14个图像的列表上运行此代码，它将生成126个增强图像。如果我在包含125个图像的列表上运行它，它将生成370个增强图像。我的问题是：为什么？

Answer 1

如果您在Keras中使用数据扩充，那么每次生成某些数据时，数据都会略有修改。

现在，一些数据增强步骤具有有限数量的选项（例如，您可以翻转图像，或者不翻转图像），因此使用这些选项可能会使图像数量增加一倍。

其他人有（实际上）无数个选项。例如，当您指定rotation_range=40时，这意味着每次生成图像时，此图像将以-40到40度之间随机选择的角度旋转。

因此，通过您使用的数据扩充，您实际上可以生成无限多个不同的图像。然而，这些将是高度相关的，并且显然不如实际拥有无限多的图像那么好。

数据增强如何在Keras中发挥作用？

1 个答案: