我查看了所有“ slow fit_generator”帖子并实施了所有修复程序,而我的自定义生成器仍然比标准拟合慢50倍。在相同的配置下,我们说的是20秒/纪元vs.800秒/纪元。
如果我设置verbose = 1,我还注意到使用自定义生成器,它将以固定间隔“暂停”。如果我使用标准版型,则它将以固定的速度遍历所有批次。 编辑:我注意到暂停是在工人的倍数。如果我将workers设置为16,use_multiprocessing = True,它将在第16个批次暂停很长时间。反正有什么办法可以防止这种暂停?
这是我的自定义生成器代码的精简版,如何进一步对其进行优化?
class custom_generator(Sequence):
def __init__(self, <variables passed in here>):
# Assign self.variables here
def __len__(self):
return int(np.floor(len(self.y) / self.batch_size))
def on_epoch_end(self):
if self.shuffle == True:
# do shuffle
def image_to_np(self, imagePath, seed):
img = cv2.imread(imagePath)
# Do random data augmentation
if self.aug == True:
new_img = self.image_gen.random_transform(img, seed=seed)
new_img = self.image_gen.standardize(new_img)
img = np.expand_dims(new_img, axis=0)
elif self.aug == False:
img = np.expand_dims(img, axis=0)
else:
raise Exception('Specify augmentation True or False')
return img, imagePath
def generate_images(self, i):
img_seed = random.randint(0,1000)
# Process frame
imagePath = self.X[i]
imageLabel = self.y[i]
img, path = self.image_to_np(imagePath, img_seed)
return img, imageLabel
def __getitem__(self, i, path=False):
# Initialize empty np arrays
data = np.empty((self.batch_size, self.dim[1], self.dim[0], self.n_channels), dtype=float)
labl = np.empty((self.batch_size), dtype=int)
k = i * self.batch_size
for idx in range(self.batch_size):
# Generate image
img, label = self.generate_images(k)
k += 1
# Store image
data[idx] = img
# Store labels
labl[idx] = label
return data, labl