我有一个Tensorflow DNN模型,我使用feed_dict
来输入输入
Training and Test
数据和属于它们的标签。为了简单起见,这是代码的重要部分:
def feed_dict(train):
"""Make a TensorFlow feed_dict: maps data onto Tensor placeholders."""
if train == True :
xs,ys = next_Training_Batch()
drop_out_value = 0.9
else:
#Run a test
xs,ys= Testing_Data,Testing_Labels
drop_out_value = 1
return {x:xs,y_:ys,keep_prob:drop_out_value}
for i in range(max_steps):
if i%5 ==0: # Record summarie and Test-set accruracy
summary, acc = sess.run([merged,accuracy], feed_dict=feed_dict(False))
test_writer.add_summary(summary,i)
#print('Accuracy at steps%s: %s '%(i,acc))
else:# Record train set summaries and train
if i%10==0:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary, _ = sess.run([merged, train_steps],
feed_dict=feed_dict(True),
options=run_options,
run_metadata=run_metadata)
train_writer.add_run_metadata(run_metadata, 'step%03d' % i)
train_writer.add_summary(summary, i)
else:
summary,_ = sess.run([merged, train_steps], feed_dict=feed_dict(True))
train_writer.add_summary(summary,i)
我一直在阅读有关TF队列的方法是一种更有效的方法,我在运行时遇到了很大的困难,这是我到目前为止所做的:
import tensorflow as tf
'''
# Both Training.csv and Test.csv include the features values and the and labels as follows :
Feature0 Feature1 Feature2 Feature3 Feature4 Feature5 ....... ClassID(Labels) onehot
0.200985 1.000000 0.064534 0.415348 0.005391 1.000000 1000 1
0.151232 1.000000 0.048849 0.312474 0.007160 1.000000 2001 2
0.061576 1.000000 0.026125 0.127097 0.017450 1.000000 1000 3
...............................................................................
Each file has > 2500 rows
'''
fileNames = ["Training.csv","Test.csv"]
BATCH_SIZE = 20
number_OF_features = 450
def batch_generator(fileNames):
fileNames_queue = tf.train.string_input_producer(fileNames)
reader = tf.TextLineReader(skip_header_lines=1)
key , values = reader.read(fileNames_queue)
record_defaults = [[1.0] for _ in range(number_OF_features)]
content = tf.decode_csv(values,record_defaults = record_defaults)
features = tf.stack(content[:-2])
labels = content[-1]
min_after_dequeue =10 * BATCH_SIZE
capacity = 20*BATCH_SIZE
# suffle the data
data_batch, label_batch = tf.train.shuffle_batch([features, labels], batch_size=BATCH_SIZE,
capacity =capacity , min_after_dequeue = min_after_dequeue)
return data_batch , label_batch
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for _ in range(100 ): # generating 100 batch
sess.run(batch_generator(fileNames))
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# I just don'T get how to proceed from this point
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
coord.request_stop()
coord.join(threads)
我的问题是如何在训练和测试中“提供”数据。我一直在阅读TF文档,但它没有帮助。
参考:我编写的代码基于此great tutorials
答案 0 :(得分:1)
您的代码中缺少的是您希望如何使用data_batch
和label_batch
张量。根据这段代码(假设为(BATCH_SIZE, number_OF_features)
),这些张量的形状为(BATCH_SIZE, 1)
(分别为number_OF_features == len(content)-1
}:
features = tf.stack(content[:-2])
labels = content[-1]
...
data_batch, label_batch = tf.train.shuffle_batch([features, labels], batch_size=BATCH_SIZE,...)
和tf.train.shuffle_batch
的文档(请参阅code):
将输出形状为
[x, y, z]
的输入张量 作为形状为[batch_size, x, y, z]
的张量。
例如,您可以实现一个函数,该函数将data_batch
和label_batch
作为参数,并创建并返回cross_entropy
操作。然后,您可以使用此操作来使用GradientDescentOptimizer
训练模型。
您的cross_entropy
操作系统因此train_op
取决于您的data_batch
和label_batch
。因此,只要您要求Session
运行train_op
,它就会尝试通过data_batch
和label_batch
从您的数据队列中将新批次出列。
例如:
# Let's create the batch op
data_batch , label_batch = batch_generator(fileNames)
# Let's use the batch op
cross_entropy = get_cost_op(data_batch, label_batch)
train_op = tf.train.GradientDescentOptimizer().minimize(cross_entropy)
# We're done with the creation of our model so let's train it.
# create NUM_THREADS to do enqueue
qr = tf.train.QueueRunner(queue, [enqueue_op] * NUM_THREADS)
with tf.Session() as sess:
# create a coordinator, launch the queue runner threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for step in xrange(100): # do to 100 iterations
if coord.should_stop():
break
sess.run(train_op)
coord.request_stop()
coord.join(enqueue_threads)