Question

我当前正在加载图像，从中创建一个数组并将其附加到列表中。可悲的是，这似乎耗尽了我所有的RAM，无法容纳我要加载的图像数量（20k）。

代码：

def convert_image_to_array(files,relpath):
    images_as_array=[]
    len_files = len(files)
    i = 0
    print("---ConvImg2Arr---")
    print("---STARTING---")
    for file in files:
        images_as_array.append(img_to_array(load_img(relpath+file, target_size=(soll_img_shape, soll_img_shape)))/255)
        if i == int(len_files*0.2):
            print("20% done")
        if i == int(len_files*0.5):
            print("50% done")
        if i == int(len_files*0.8):
            print("80% done")

        i +=1
    print("---DONE---")
    return images_as_array

使用来自train_test_split的X_train来调用它：

x_train =  convert_image_to_array_opt(X_train,rel_path)

什么是加载所有这些图像的更有效方法？

编辑：

使用Keras的.flow_from_directory（）解决了我的问题，但是我仍然想知道如何通过尝试的方式来完成。

Answer 1

假设方法load_img不是瓶颈，convert_image_to_array_opt将所有图像（20k）加载到内存中。但是，flow_from_directory方法一次只能加载一个图像浴（典型的批处理大小为32、64 ... ... 1024）

重新设计convert_image_to_array_opt的一种可能方法是将批量大小作为参数并加载，并且yield是仅加载bath_size图像（以及标签）的numpy数组。并且在训练时列举了convert_image_to_array_opt方法，该方法返回可以训练的batch_size X和y。

从路径加载大量图像并将其转换为大小为（n，224,224,3）的数组

1 个答案: