加载数据集以避免内存消耗

时间:2018-11-24 19:13:26

标签: python image dataset

我有一个数据集,其中包含60,000张227X227X3图像。将这些图像加载到内存时遇到内存不足的问题。我需要加载图像的建议,以避免内存不足。以下是我用来加载图像的python代码。谁能告诉我如何改善以下内容。

def loadImages(fnames,is_test):
    path = '/home/assad/Desktop/grandfinal/grandfinalv2/dataset/test_images/'
    if is_test:
        path = '/home/assad/Desktop/grandfinal/grandfinalv2/dataset/test_images/'
    loadedImages = []
     #loadedImages = np.empty((N, 3, 227, 227), dtype=np.uint8)    
    for image in fnames:
        tmp = Image.open(path + image)
        img = tmp.copy()
        loadedImages.append(img)
        tmp.close()
    return loadedImages



def get_pixels(fnames,is_test):
    imgs = loadImages(fnames, is_test)
    #print imgs
    pixel_list = []
    for img in imgs:
        img = img.resize((227, 227), Image.ANTIALIAS)
        arr = np.array(img, dtype="uint8")
        arr=np.rollaxis(arr,2)
        arr=arr.reshape(-1)
        pixel_list.append(list(arr))
    return np.array(pixel_list)


def label_from_category(category_id=None):
    label_list = np.zeros(4)
    label_list[category_id]=1
    return list(label_list)
#print(label_from_category())


def features_from_data(data, is_test=True):
    pixels = get_pixels(data.FILENAME, is_test)
    labels = data["CATEGORY_ID"]
    return pixels, labels

test_data = get_data(is_test=True)



iX_test, iY_test = features_from_data(test_data, is_test=True)
iY_test=iY_test.tolist()
iX_test, iY_test = features_from_data(test_data, is_test=True)
print (iX_test.shape)
iY_test=iY_test.tolist()
print(iY_test)

1 个答案:

答案 0 :(得分:1)

对我来说,这就像是generator的教科书用例。

loadImages函数更改为yield图像,而不是将所有图像都加载到list中。

尝试一下:

def loadImages(fnames,is_test):
    path = '/home/assad/Desktop/grandfinal/grandfinalv2/dataset/test_images/'
    if is_test:
        path = '/home/assad/Desktop/grandfinal/grandfinalv2/dataset/test_images/'
    for image in fnames:
        tmp = Image.open(path + image)
        img = tmp.copy()
        tmp.close()
        yield img

其余代码应保持不变。