Question

我正在尝试运行循环，遍历图像文件夹并返回两个numpy数组： x - 将图像存储为numpy数组 y - 存储标签

文件夹可以轻松拥有超过40.000 rgb图像，尺寸为（224,224）。我有大约12Gb的内存，但经过一些迭代后，使用的内存只会突然增加，一切都会停止。

我该怎么做才能解决这个问题？

def create_set(path, quality):
    x_file = glob.glob(path + '*')
    x = []

    for i, img in enumerate(x_file):
        image = cv2.imread(img, cv2.IMREAD_COLOR)
        x.append(np.asarray(image))
        if i % 50 == 0:
            print('{} - {} images processed'.format(path, i))

    x = np.asarray(x)
    x = x/255

    y = np.zeros((x.shape[0], 2))
    if quality == 0:
        y[:,0] = 1
    else:
        y[:,1] = 1 

    return x, y

Answer 1

你无法将那么多图像加载到内存中。您正尝试将指定路径中的每个文件加载到内存中，方法是将它们附加到x。

尝试批量处理它们，或者如果您正在为tensorflow应用程序执行此操作，请先尝试将它们写入.tfrecords。

如果您想保存一些内存，请将图像保留为np.uint8，而不是将它们转换为浮动（当您在此行中对它们进行标准化时会自动生成＆gt; x = x/255）

您的np.asarray行中也不需要x.append(np.asarray(image))。 image已经是一个数组。 np.asarray用于将列表，元组等转换为数组。

修改

一个非常粗略的批处理示例：

def batching function(imlist, batchsize): ims = [] batch = imlist[:batchsize] for image in batch: ims.append(image) other_processing() new_imlist = imlist[batchsize:] return x, new_imlist def main(): imlist = all_the_globbing_here() for i in range(total_files/batch_size): ims, imlist = batching_function(imlist, batchsize) process_images(ims)

内存不足将图像文件转换为numpy数组

1 个答案: