我正在尝试使用分为两部分的数据集训练模型,对于每一部分,我都使用keras和tensorflow创建一个不同的ImageDataGenerator。
我的问题是,如何结合我两个发生器的数据来训练模型。我不想分别使用每个人
所有人的tnx
答案 0 :(得分:0)
如果没有最小限度的代码样本来描述它,很难完全理解您想要实现的目标。我的意思是,如何组合发电机?他们两个之间交替出现?
无论如何,您可以使用keras Sequence objects创建自己的数据生成器。
您可以找到自定义数据生成器here的示例。
答案 1 :(得分:0)
您已将所有数据分为两个不同的目录;现在您要使用来自这两个目录的数据来训练模型。
您可以通过两种方式实现这一目标:
Keras ImageDataGenerator flow_from_directory
方法具有一个follow_links
参数。您可以使用follow_links
。根据您的需要和所需的类结构创建一个单独的目录。在其中,从原始数据目录创建符号链接。在下图中,您可以将data
目录用作主输入目录。
.
├── Directory1/
│ ├── Class1/
│ └── Class2/
├── Directory2/
│ ├── Class1/
│ └── Class2/
└── Data/
├── Class1/
│ ├── symlink_to_Directory1_Class1
│ └── symlink_to_Directory2_Class1
└── Class2/
├── symlink_to_Directory1_Class2
└── symlink_to_Directory2_Class2
为两个不同的目录创建了两个不同的ImageDatagenerator
。然后将它们合并为一个。在这种情况下,必须根据各个目录中的数据数量来确定子生成器的批处理大小。
子发电机的批量大小:
Where,
b = Batch Size Of Any Sub-generator
B = Desired Batch Size Of The Merged Generator
n = Number Of Images In That Directory Of Sub-generator
the sum of n = Total Number Of Images In All Directories
请参见下面的代码
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import Sequence
import matplotlib.pyplot as plt
import numpy as np
import os
class MergedGenerators(Sequence):
def __init__(self, batch_size, generators=[], sub_batch_size=[]):
self.generators = generators
self.sub_batch_size = sub_batch_size
self.batch_size = batch_size
def __len__(self):
return int(
sum([(len(self.generators[idx]) * self.sub_batch_size[idx])
for idx in range(len(self.sub_batch_size))]) /
self.batch_size)
def __getitem__(self, index):
"""Getting items from the generators and packing them"""
X_batch = []
Y_batch = []
for generator in self.generators:
if generator.class_mode is None:
x1 = generator[index % len(generator)]
X_batch = [*X_batch, *x1]
else:
x1, y1 = generator[index % len(generator)]
X_batch = [*X_batch, *x1]
Y_batch = [*Y_batch, *y1]
if self.generators[0].class_mode is None:
return np.array(X_batch)
return np.array(X_batch), np.array(Y_batch)
def build_datagenerator(dir1=None, dir2=None, batch_size=32):
n_images_in_dir1 = sum([len(files) for r, d, files in os.walk(dir1)])
n_images_in_dir2 = sum([len(files) for r, d, files in os.walk(dir2)])
# Have to set different batch size for two generators as number of images
# in those two directories are not same. As we have to equalize the image
# share in the generators
generator1_batch_size = int((n_images_in_dir1 * batch_size) /
(n_images_in_dir1 + n_images_in_dir2))
generator2_batch_size = batch_size - generator1_batch_size
generator1 = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
rotation_range=5.,
horizontal_flip=True,
)
generator2 = ImageDataGenerator(
rescale=1. / 255,
zoom_range=0.2,
horizontal_flip=False,
)
# generator2 has different image augmentation attributes than generaor1
generator1 = generator1.flow_from_directory(
dir1,
target_size=(128, 128),
color_mode='rgb',
class_mode=None,
batch_size=generator1_batch_size,
shuffle=True,
seed=42,
interpolation="bicubic",
)
generator2 = generator2.flow_from_directory(
dir2,
target_size=(128, 128),
color_mode='rgb',
class_mode=None,
batch_size=generator2_batch_size,
shuffle=True,
seed=42,
interpolation="bicubic",
)
return MergedGenerators(
batch_size,
generators=[generator1, generator2],
sub_batch_size=[generator1_batch_size, generator2_batch_size])
def test_datagen(batch_size=32):
datagen = build_datagenerator(dir1="./asdf",
dir2="./asdf2",
batch_size=batch_size)
print("Datagenerator length (Batch count):", len(datagen))
for batch_count, image_batch in enumerate(datagen):
if batch_count == 1:
break
print("Images: ", image_batch.shape)
plt.figure(figsize=(10, 10))
for i in range(image_batch.shape[0]):
plt.subplot(1, batch_size, i + 1)
plt.imshow(image_batch[i], interpolation='nearest')
plt.axis('off')
plt.tight_layout()
test_datagen(4)