我的系统配置了16GB RAM。我试图使用VGG19和KNN的最近邻居在2千万张图像(总大小10GB)上训练图像相似性模型。尝试读取图像时,出现内存错误。即使我已经尝试在200000(总大小770MB)上训练模型,但是问题是相同的。我如何读取数百万张图像来训练ML模型。
Ubuntu 18.04.2 LTS,Core™i7,英特尔®HD Graphics 5500(Broadwell GT2),64位,16GB RAM
import os
import skimage.io
import tensorflow as tf
from skimage.transform import resize
import numpy as np
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
from matplotlib import offsetbox
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from sklearn import manifold
import pickle
skimage.io.use_plugin('matplotlib')
dirPath = 'train_data'
args = [os.path.join(dirPath, filename) for filename in os.listdir(dirPath)]
imgs_train = [skimage.io.imread(arg, as_gray=False) for arg in args]
shape_img = (130, 130, 3)
model = tf.keras.applications.VGG19(weights='imagenet', include_top=False,
input_shape=shape_img)
model.summary()
shape_img_resize = tuple([int(x) for x in model.input.shape[1:]])
input_shape_model = tuple([int(x) for x in model.input.shape[1:]])
output_shape_model = tuple([int(x) for x in model.output.shape[1:]])
n_epochs = None
def resize_img(img, shape_resized):
img_resized = resize(img, shape_resized,
anti_aliasing=True,
preserve_range=True)
assert img_resized.shape == shape_resized
return img_resized
def normalize_img(img):
return img / 255.
def transform_img(img, shape_resize):
img_transformed = resize_img(img, shape_resize)
img_transformed = normalize_img(img_transformed)
return img_transformed
def apply_transformer(imgs, shape_resize):
imgs_transform = [transform_img(img, shape_resize) for img in imgs]
return imgs_transform
imgs_train_transformed = apply_transformer(imgs_train, shape_img_resize)
X_train = np.array(imgs_train_transformed).reshape((-1,) + input_shape_model)
E_train = model.predict(X_train)
E_train_flatten = E_train.reshape((-1, np.prod(output_shape_model)))
knn = NearestNeighbors(n_neighbors=5, metric="cosine")
knn.fit(E_train_flatten)
答案 0 :(得分:0)
解决此问题的一种方法是读取少量图像,然后根据需要进行预处理,然后将其作为微型批处理传递给模型。
答案 1 :(得分:0)
知道keras与generator配合良好时,您应该考虑使用以下一种: python generator tutorial, using a generator with keras (example)
它允许您在训练期间逐批加载图像。