Keras flowFromDirectory在生成文件名时获取文件名

时间:2017-01-18 08:59:00

标签: python machine-learning neural-network keras

是否可以获取使用flow_from_directory加载的文件名? 我有:

datagen = ImageDataGenerator(
    rotation_range=3,
#     featurewise_std_normalization=True,
    fill_mode='nearest',
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)

train_generator = datagen.flow_from_directory(
        path+'/train',
        target_size=(224, 224),
        batch_size=batch_size,)

我的多输出模型有一个自定义生成器,如:

a = np.arange(8).reshape(2, 4)
# print(a)

print(train_generator.filenames)

def generate():
    while 1:
        x,y = train_generator.next()
        yield [x] ,[a,y]

节点我目前正为a生成随机数但是对于真正的训练,我希望加载一个json文件,其中包含我的图像的边界框坐标。为此,我需要获取使用train_generator.next()方法生成的文件名。完成后,我可以加载文件,解析json并传递它而不是ax变量的排序和我得到的文件名列表也是相同的。

6 个答案:

答案 0 :(得分:21)

是的,至少在2.0.4版本(不了解早期版本)是可能的。

ipconfig /release ipconfig /all ipconfig /flushdns ipconfig /renew netsh int ip set dns netsh winsock reset ###( Need to run in administrator mode) 的实例具有ImageDataGenerator().flow_from_directory(...)的属性,该属性是生成器生成它们的顺序中的所有文件的列表,还有属性filenames。所以你可以这样做:

batch_index

在生成器上的每次迭代中,您都可以获得相应的文件名:

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)

这将为您提供当前批次中图像的文件名。

答案 1 :(得分:2)

以下是一个与shuffle=True一起使用的示例。并且还可以正确处理最后一批。要通过:

datagen = ImageDataGenerator().flow_from_directory(...)    
batches_per_epoch = datagen.samples // datagen.batch_size + (datagen.samples % datagen.batch_size > 0)
for i in range(batches_per_epoch):
    batch = next(datagen)
    current_index = ((datagen.batch_index-1) * datagen.batch_size)
    if current_index < 0:
        if datagen.samples % datagen.batch_size > 0:
            current_index = max(0,datagen.samples - datagen.samples % datagen.batch_size)
        else:
            current_index = max(0,datagen.samples - datagen.batch_size)
    index_array = datagen.index_array[current_index:current_index + datagen.batch_size].tolist()
    img_paths = [datagen.filepaths[idx] for idx in index_array]
    #batch[0] - x, batch[1] - y, img_paths - absolute path

答案 2 :(得分:1)

您可以通过继承DirectoryIterator来制作一个返回image, file_path元组的非常小的子类:

import numpy as np
from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator

class ImageWithNames(DirectoryIterator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.filenames_np = np.array(self.filepaths)
        self.class_mode = None # so that we only get the images back

    def _get_batches_of_transformed_samples(self, index_array):
        return (super()._get_batches_of_transformed_samples(index_array),
                self.filenames_np[index_array])

在初始化中,我添加了一个属性,它是self.filepaths的numpy版本,以便我们可以轻松地索引到该数组中,以获取每个批次生成的路径。

对基类的唯一其他更改是返回一个元组,该元组是图像批处理super()._get_batches_of_transformed_samples(index_array)和文件路径self.filenames_np[index_array]

有了它,您可以像这样使生成器:

imagegen = ImageDataGenerator()
datagen = ImageWNames('/data/path', imagegen, target_size=(224,224))

然后与

联系
next(datagen)

答案 3 :(得分:1)

至少在2.2.4版中,您可以这样做

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)
for file in gen.filenames:
    print(file)

或获取文件路径

for filepath in gen.filepaths:
    print(filepath)

答案 4 :(得分:1)

以下代码可能会有所帮助。覆盖flow_from_directory

    class AugmentingDataGenerator(ImageDataGenerator):
    def flow_from_directory(self, directory, mask_generator, *args, **kwargs):
        generator = super().flow_from_directory(directory, class_mode=None, *args, **kwargs)        
        seed = None if 'seed' not in kwargs else kwargs['seed']
        while True:           
            for image_path in generator.filepaths:
                # Get augmentend image samples
                image = next(generator)
                # print(image_path )

                yield image,image_path

# Create training generator
train_datagen = AugmentingDataGenerator(  
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    rescale=1./255,
    horizontal_flip=True
)
train_generator = train_datagen.flow_from_directory(
    TRAIN_DIRECTORY_PATH, 
    target_size=(256, 256),
    shuffle = False,
    batch_size=BATCH_SIZE
)

# Create testing generator
test_datagen = AugmentingDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
    TEST_DIRECTORY_PATH,  
    target_size=(256, 256),
    shuffle = False, # inorder to get imagepath of the same image
    batch_size=BATCH_SIZE 
)

并检查返回的图像和文件路径

image,file_path = next(test_generator)
# print(file_path)
# plt.imshow(image)

答案 5 :(得分:0)

我确实需要这个,并且我开发了一个简单的函数可以与shuffle=Trueshuffle=False一起使用。

def get_indices_from_keras_generator(gen, batch_size):
    """
    Given a keras data generator, it returns the indices and the filepaths
    corresponding the current batch. 
    :param gen: keras generator.
    :param batch_size: size of the last batch generated.
    :return: tuple with indices and filenames
    """

    idx_left = (gen.batch_index - 1) * batch_size
    idx_right = idx_left + gen.batch_size if idx_left >= 0 else None
    indices = gen.index_array[idx_left:idx_right]
    filenames = [gen.filenames[i] for i in indices]
    return indices, filenames

然后,您将按以下方式使用它:

for x, y in gen:
    indices, filenames = get_indices_from_keras_generator(gen)