我有一个庞大的图像数据集,我想将其拆分为一个补丁网格,例如10乘10补丁的网格。对于这些不同的补丁,我想训练一个自动编码器,因此在10x10网格的情况下,我将有100个自动编码器。
到目前为止,我的解决方案是为每个补丁创建一个新的ImageDataGenerator。但是我认为这样效率太低,因为尽管只需要一个补丁,但所有图像都需要完全加载100次(每个自动编码器一次)。理论上,一次就足够了。有没有更好的方法,我看不到?预先感谢!
def crop_to_patch_function(patch_x: int, patch_y: int, grid_size: int):
def crop_to_patch(img):
x, y = patch_x*grid_size, patch_y*grid_size
return img[y:(y+grid_size), x:(x+grid_size), :]
return crop_to_patch
def patch_generator(patch_x, patch_y, grid_size):
datagen = ImageDataGenerator(rescale=1/255)
train_batches_tmp = datagen.flow_from_directory(
directory=train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
color_mode='rgb',
class_mode='input',
)
while True:
batch_x, batch_y = next(train_batches_tmp)
batch_patches = np.zeros((batch_x.shape[0], grid_size, grid_size, 3))
for i in range(batch_x.shape[0]):
batch_patches[i] = crop_to_patch_function(patch_x, patch_y, grid_size)(batch_x[i])
yield (batch_patches, batch_patches)
# batches of patch at pos (2, 4)
patch_x, patch_y = 2, 4
train_patch_batches = patch_generator(patch_x, patch_y, grid_size)
答案 0 :(得分:1)
通过预先创建补丁来预处理图像不起作用?将它们保存到其他目录,并将每个ImageDataGenerator分配到100个目录之一,为每个模型加载数据。
类似的东西:
def images_to_patches(images_list):
for idx,image in enumerator(images_list):
for patch_x in range(10):
for patch_y in range(10):
//returns the patch image
patch_img = crop_patch(image, patch_x, patch_y, grid_size)
img_dir = str(patch_x)+str(patch_y)
patch_img.save(os.path.join(img_dir,idx))