Question

我有4个单独的图像文件夹和各自独立的标签（文件夹1中的图像对应于标签1等）。

然而，图像数据集是不平衡的，其中我有太多带有标签1和2的图像但没有足够的图像用于标签3和4。

因此，我决定尝试进行图像增强以增强我的图像数据集。

以下是我的代码的样子。

train_datagen = ImageDataGenerator(rotation_range=20,width_shift_range=0.2, height_shift_range=0.2,preprocessing_function=preprocess_input,horizontal_flip=True)


train_generator=train_datagen.flow_from_directory('/trainImages',target_size=(80,80),batch_size=32,class_mode='categorical')

所有图片文件夹都在路径“/trainImages"(e.g：”/ trainImages / 1“，”/ trainImages / 2“）

这种方法的问题是对文件夹1和2中的图像也进行了扩充（不需要增强）

有没有办法自定义ImageDataGenerator来忽略文件夹1和2的图像增强参数？

我对Python和Keras都很陌生......

Answer 1

您可以创建两个文件夹结构：

文件夹1 - 仅包含不扩充的类的结构
文件夹2 - 包含要扩充的类的结构

然后你创建了两个不同的生成器。

dataGen1 = ImageDataGenerator(...)
dataGen2 = ImageDataGenerator(.... withAugmentation ....)

sequencer1 = dataGen1.flow_from_directory(dir1, ....)
sequencer2 = dataGen2.flow_from_directory(dir2, ....)

现在您创建自己的生成器，该生成器应包含每个顺控程序的索引列表。

此代码未经过测试，如果有错误你可以评论，所以我明天测试

def myGenerator(seq1, seq2, multiplySeq2By):

    generators = [seq1,seq2]

    #here we're creating indices to get data from the generators
    len1 = len(seq1)
    len2 = len(seq2)

    indices1 = np.zeros((len1,2))
    indices2 = np.ones((len2,2))

    indices1[:,1] = np.arange(len1) #pairs like [0,0], [0,1], [0,2]....
    indices2[:,1] = np.arange(len2) #pairs like [1,0], [1,1], [1,2]....

    indices2 = [indices2] * multiplySeq2By #repeat indices2 to generate more from it
    allIndices = np.concatenate([indices1] + indices2, axis=0)

    #you can randomize the order here:
    np.random.shuffle(allIndices)

    #now we loop the indices infinitely to get data from the original generators
    while True: 
        for g, el in allIndices:
            x,y = generators[g][el]
            yield x,y #when training, or "yield x" when testing                    

        #you may want another round of shuffling here for the next epoch.

请记住使用steps_per_epoch = len1 + (multiplySeq2By * len2)

仅限特定文件夹的图像数据生成器扩充参数（Keras）

1 个答案: