我有〜10000k张图像无法容纳在内存中。所以现在我只能读取1000张图像并对其进行训练...
我的代码在这里:
img_dir = "TrainingSet" # Enter Directory of all images
image_path = os.path.join(img_dir+"/images",'*.bmp')
files = glob.glob(image_path)
images = []
masks = []
contours = []
indexes = []
files_names = []
for f1 in np.sort(files):
img = cv2.imread(f1)
result = re.search('original_cropped_(.*).bmp', str(f1))
idx = result.group(1)
mask_path = img_dir+"/masks/mask_cropped_"+str(idx)+".bmp"
mask = cv2.imread(mask_path,0)
contour_path = img_dir+"/contours/contour_cropped_"+str(idx)+".bmp"
contour = cv2.imread(contour_path,0)
indexes.append(idx)
images.append(img)
masks.append(mask)
contours.append(contour)
train_df = pd.DataFrame({"id":indexes,"masks": masks, "images": images,"contours": contours })
train_df.sort_values(by="id",ascending=True,inplace=True)
print(train_df.shape)
img_size_target = (256,256)
ids_train, ids_valid, x_train, x_valid, y_train, y_valid, c_train, c_valid = train_test_split(
train_df.index.values,
np.array(train_df.images.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],3))),
np.array(train_df.masks.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],1))),
np.array(train_df.contours.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],1))),
test_size=0.2, random_state=1337)
#Here we define the model architecture...
#.....
#End of model definition
# Training
optimizer = Adam(lr=1e-3,decay=1e-10)
model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
early_stopping = EarlyStopping(patience=10, verbose=1)
model_checkpoint = ModelCheckpoint("./keras.model", save_best_only=True, verbose=1)
reduce_lr = ReduceLROnPlateau(factor=0.5, patience=5, min_lr=0.00001, verbose=1)
epochs = 200
batch_size = 32
history = model.fit(x_train, y_train,
validation_data=[x_valid, y_valid],
epochs=epochs,
batch_size=batch_size,
callbacks=[early_stopping, model_checkpoint, reduce_lr])
我想知道的是,如何在不将其他所有10000都加载到内存的情况下修改我的代码,以便批量处理一小组小的图像?这意味着该算法将从目录中读取每个时期的X张图像并对其进行训练,然后再进行下一个X直到最后一个。
在这里X是可以容纳到内存中的合理数量的图像。
答案 0 :(得分:1)
使用fit_generator代替fit
def generate_batch_data(num):
#load X images here
return images
model.fit_generator(generate_batch_data(X),
samples_per_epoch=10000, nb_epoch=10)
或者,您可以使用train_on_batch代替fit
在GitHub上讨论以下主题:https://github.com/keras-team/keras/issues/2708
答案 1 :(得分:0)
np.array(train_df.images.apply(lambda x:cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],3)))
您可以先将此过滤器(以及其他2个过滤器)应用于每个文件,然后将它们保存到单独脚本中的特殊文件夹(images_prepoc,masks_preproc等。)中,然后将它们重新加载回去,以准备在当前脚本。
假设实际图像尺寸大于256x256,您将拥有更快的算法,使用更少的内存,而只需要一个准备阶段。