我注意到,如果我将训练数据加载到内存中并将其作为一个numpy数组提供给图形,而不是使用相同大小的shuffle批处理,速度会有很大差异,我的数据有~1000个实例。
使用内存1000次迭代只需不到几秒钟,但使用随机播放批次需要大约10分钟。我得到的shuffle批次应该有点慢,但这似乎太慢了。这是为什么?
添加了赏金。关于如何更快地进行混洗小批量的任何建议?
这是我的代码:
shuffle_batch
import numpy as np
import tensorflow as tf
data = np.loadtxt('bounty_training.csv',
delimiter=',',skiprows=1,usecols = (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14))
filename = "test.tfrecords"
with tf.python_io.TFRecordWriter(filename) as writer:
for row in data:
features, label = row[:-1], row[-1]
example = tf.train.Example()
example.features.feature['features'].float_list.value.extend(features)
example.features.feature['label'].float_list.value.append(label)
writer.write(example.SerializeToString())
def read_and_decode_single_example(filename):
filename_queue = tf.train.string_input_producer([filename],
num_epochs=None)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features={
'label': tf.FixedLenFeature([], np.float32),
'features': tf.FixedLenFeature([14], np.float32)})
pdiff = features['label']
avgs = features['features']
return avgs, pdiff
avgs, pdiff = read_and_decode_single_example(filename)
n_features = 14
batch_size = 1000
hidden_units = 7
lr = .001
avgs_batch, pdiff_batch = tf.train.shuffle_batch(
[avgs, pdiff], batch_size=batch_size,
capacity=5000,
min_after_dequeue=2000)
X = tf.placeholder(tf.float32,[None,n_features])
Y = tf.placeholder(tf.float32,[None,1])
W = tf.Variable(tf.truncated_normal([n_features,hidden_units]))
b = tf.Variable(tf.zeros([hidden_units]))
Wout = tf.Variable(tf.truncated_normal([hidden_units,1]))
bout = tf.Variable(tf.zeros([1]))
hidden1 = tf.matmul(X,W) + b
pred = tf.matmul(hidden1,Wout) + bout
loss = tf.reduce_mean(tf.squared_difference(pred,Y))
optimizer = tf.train.AdamOptimizer(lr).minimize(loss)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for step in range(1000):
x_, y_ = sess.run([avgs_batch,pdiff_batch])
_, loss_val = sess.run([optimizer,loss],
feed_dict={X: x_, Y: y_.reshape(batch_size,1)} )
if step % 100 == 0:
print(loss_val)
coord.request_stop()
coord.join(threads)
通过numpy数组的完整批次
"""
avgs and pdiff loaded into numpy arrays first...
Same model as above
"""
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for step in range(1000):
_, loss_value = sess.run([optimizer,loss],
feed_dict={X: avgs,Y: pdiff.reshape(n_instances,1)} )
答案 0 :(得分:3)
在这种情况下,您每个步骤运行3次会话 - 一次在avgs_batch.eval
,一次用于pdiff_batch.eval
,一次用于实际的sess.run
通话。这并不能解释减速的幅度,但这绝对是你应该记住的事情。至少应将前两个eval调用合并为一个sess.run
调用。
我怀疑减速的大部分来自于使用TFRecordReader
。我不会假装理解tensorflow的内部工作原理,但您可能会发现我的答案here很有帮助。
摘要
tensorflow.python.framework.ops.convert_to_tensor
; tf.train.slice_input_producer
获取单个示例的张量; tf.train.batch
将它们组合在一起进行分组。答案 1 :(得分:2)
诀窍不是将单个示例提供给shuffle_batch,而是使用enqueue_many = True向n + 1维张量的示例提供。我发现这个帖子非常有帮助:
TFRecordReader seems extremely slow , and multi-threads reading not working
def get_batch(batch_size):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
batch_list = []
for i in range(batch_size):
batch_list.append(serialized_example)
return [batch_list]
batch_serialized_example = tf.train.shuffle_batch(
get_batch(batch_size), batch_size=batch_size,
capacity=100*batch_size,
min_after_dequeue=batch_size*10,
num_threads=1,
enqueue_many=True)
features = tf.parse_example(
batch_serialized_example,
features={
'label': tf.FixedLenFeature([], np.float32),
'features': tf.FixedLenFeature([14], np.float32)})
batch_pdiff = features['label']
batch_avgs = features['features']
...
答案 2 :(得分:0)
使用队列获取数据时,不应使用feed_dict。相反,让您的图表直接依赖于输入数据,即:
直接使用您的功能批
hidden1 = tf.matmul(avgs_batch,W) + b
类似地,在计算损失时使用标签批次(pdiff_batch)而不是Y
最后,只需保留第二个session.run直接计算损失,而不使用feed_dict
# x_, y_ = sess.run([avgs_batch,pdiff_batch])
# _, loss_val = sess.run([optimizer,loss],
feed_dict={X: x_, Y: y_.reshape(batch_size,1)} )
_, loss_val = sess.run([optimizer,loss])