我有大约6万个大小为200x870的样本,它们都是numpy数组,我想用它们构建一个四维张量(有一个单一维度)并用张量流中的CNN训练它们。到目前为止,我正在使用我可以加载的数据并创建批次,如下所示:
with tf.Graph().as_default():
data_train = tf.to_float(getInput.data_train)
phase, lr = tf.placeholder(tf.bool), tf.placeholder(tf.float32)
global_step = tf.Variable(0,trainable = False)
image_train, label_train = tf.train.slice_input_producer([data_train, labels_train], num_epochs=args.num_epochs)
images_train, batch_labels_train = tf.train.batch([image_train, label_train], batch_size=args.bsize)
有人可以建议一种解决方法吗?
我想将数据集拆分为子集,并在一个历元序列中将其划分为ather,使用Queue作为此文件的路径:
import scipy.io as sc
import numpy as np
import threading
import time
import tensorflow as tf
from tensorflow.python.client import timeline
def testQueues():
paths = ['data1', 'data2', 'data3', 'data4','data5']
queue_capacity = 6
bsize = 10
num_epochs = 2
filename_queue = tf.FIFOQueue(
#min_after_dequeue=0,
capacity=queue_capacity,
dtypes=tf.string,
shapes=[[]]
)
filenames_placeholder = tf.placeholder(dtype='string', shape=(None))
filenames_enqueue_op = filename_queue.enqueue_many(filenames_placeholder)
data_train, phase = tf.placeholder(tf.float32), tf.placeholder(tf.bool)
sess= tf.Session()
sess.run(filenames_enqueue_op, feed_dict={filenames_placeholder: paths})
for i in range(len(paths)):
train_set_batch_name = sess.run(filename_queue.dequeue())
train_set_batch_name = train_set_batch_name.decode('utf-8')
train_set_batch = np.load(train_set_batch_name+'.npy')
train_set_batch = tf.cast(train_set_batch, tf.float32)
init_op = tf.group(tf.initialize_all_variables(), tf.initialize_local_variables())
sess.run(init_op)
run_one_epoch(train_set_batch, sess)
size = sess.run(filename_queue.size())
print(size)
print(train_set_batch)
def run_one_epoch(train_set,sess):
image_train = tf.train.slice_input_producer([train_set], num_epochs=1)
images_train = tf.train.batch(image_train, batch_size=10)
x = tf.nn.relu(images_train)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
sess.run(x)
except tf.errors.OutOfRangeError:
pass
finally:
# When done, ask the threads to stop.
coord.request_stop()
coord.join(threads)
testQueues()
但是我收到错误
FailedPreconditionError: Attempting to use uninitialized value input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs
[[Node: input_producer/input_producer/fraction_of_32_full/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs)]]
此外,我似乎无法使用numpy数组向tf.tensor提供字典,但稍后将其转换为tf.tensor也很麻烦。
答案 0 :(得分:3)
看看Dataset api。 “ tf.data API使您可以从简单的可重用片段构建复杂的输入管道。”
采用这种方法,您要做的就是对图形进行建模,使其可以为您处理数据并一次提取有限的数据,以供您训练模型。
如果内存问题仍然存在,那么您可能想研究生成器来创建您的tf.data.Dataset.,下一步可能是通过准备tfrecords来创建数据集来潜在地加快该过程。
点击所有链接以了解更多信息,如果您不了解某些内容,请随时发表评论。
答案 1 :(得分:1)
对于不适合内存的数据,标准解决方案是使用队列。您可以设置一些直接从文件中读取的操作(cvs文件,图像文件),并将它们提供给TensorFlow - https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html