如何在TensorFlow中加载大文件?

时间:2017-11-30 13:45:37

标签: python tensorflow

我试图训练神经网络识别图像NSFW(不安全工作),我有大约13k图像NSFW和13k图像SFW(安全工作)。对于培训,我使用了大小为64x64的8000图像(50%NSFW,50%SFW),其中6000用于训练,2000用于验证(相同比例),但我对结果不满意。 现在我尝试用相同的数量和比例进行训练,但尺寸为128x128的图像。但是,执行会给我一个out of memory例外。但带有图像的文件甚至不接近3gb,而我的内存是16gb。

我可以将其他文件中的文件分开:store1.pcklstore2.pckl等,并在我的代码中加载它们:

load('store1.pckl')
train(...)
load('store2.pckl')
train(...)

或者我应该使用store1.pckl训练,保存模型,重新执行加载模型的代码并使用store2.pckl训练并对所有数据集部分重复此操作:

loadModel('image-classifier.tfl')
load('store2.pckl')
train()
saveModel('image-classifier.tfl')

或者有没有办法不在我的内存中加载所有这些文件并加载像StreamFiles一样的本机open()?

抱歉我的英语不好。

创建文件store.pckl,将图像文件转换为二进制文件并放入文件中:

def resize_image(input_image_path,size):
    original_image = Image.open(input_image_path)
    original_image = original_image.convert('RGB')
    resized_image = original_image.resize(size)
    return resized_image
#Example: label = [(0.0,1.0),(1.0,0.0),......
f = open('store.pckl','wb')
xt , yt = shuffle(listatrain,labelt)
xv , yv = shuffle(listavali,labelv)
pickle.dump((xt,yt,xv,yv),f)

这是代码:

from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.data_utils import shuffle
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
import pickle
import os

X, Y, X_test, Y_test = pickle.load(open("store.pckl", "rb"))
X, Y = shuffle(X, Y)

img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)
img_aug.add_random_blur(sigma_max=3.)

network = input_data(shape=[None, 128, 128, 3],
                 data_preprocessing=img_prep,
                 data_augmentation=img_aug)

network = conv_2d(network, 128, 3, activation='relu')
network = max_pool_2d(network, 2)
network = conv_2d(network, 256, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 2)
network = fully_connected(network, 2048, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam',
                 loss='categorical_crossentropy',
                 learning_rate=0.001)
model = tflearn.DNN(network, tensorboard_verbose=0, checkpoint_path='image-classifier.tfl.ckpt')
model.fit(X, Y, n_epoch=100, shuffle=True, validation_set=(X_test, Y_test),
      show_metric=True, batch_size=96,
      snapshot_epoch=True,
      run_id='image-classifier')
model.save("image-classifier.tfl")

0 个答案:

没有答案