Tensorflow:使用输入管道(.csv)作为训练字典

时间:2017-07-01 15:47:06

标签: tensorflow tensorflow-gpu tensor

我正在尝试在.csv数据集(5008列,533行)上训练模型。 我正在使用文本阅读器将数据解析为两个张量,一个保存数据以便在[示例]上进行训练,另一个保持正确的标签[标签]:

def read_my_file_format(filename_queue):
    reader = tf.TextLineReader()
    key, record_string = reader.read(filename_queue)
    record_defaults = [[0.5] for row in range(5008)]

    #Left out most of the columns for obvious reasons
    col1, col2, col3, ..., col5008 = tf.decode_csv(record_string, record_defaults=record_defaults)
    example = tf.stack([col1, col2, col3, ..., col5007])
    label = col5008
    return example, label

def input_pipeline(filenames, batch_size, num_epochs=None):
    filename_queue = tf.train.string_input_producer(filenames, num_epochs=num_epochs, shuffle=True)
    example, label = read_my_file_format(filename_queue)
    min_after_dequeue = 10000
    capacity = min_after_dequeue + 3 * batch_size
    example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue)
    return example_batch, label_batch

这部分在执行以下内容时正常工作:

with tf.Session() as sess:
    ex_b, l_b = input_pipeline(["Tensorflow_vectors.csv"], 10, 1)
    print("Test: ",ex_b)

我的结果是Test: Tensor("shuffle_batch:0", shape=(10, 5007), dtype=float32)

到目前为止,这对我来说似乎很好。接下来,我创建了一个由两个隐藏层(分别为512和256个节点)组成的简单模型。出现问题的地方是我正在尝试训练模型时:

batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size)
_, cost = sess.run([optimizer, cost], feed_dict={x: batch_x.eval(), y: batch_y.eval()})

我在this example that uses the MNIST database上建立了这种方法。 但是,当我执行此操作时,即使我只使用batch_size = 1,Tensorflow也会挂起。如果我省略了应该从张量中获取实际数据的.eval()函数,我会收到以下响应:

TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

现在我可以理解,但是当我包含.eval()函数并且我不知道在哪里可以找到有关此问题的任何信息时,我不明白为什么程序会挂起。

编辑:我收录了整个脚本here的最新版本。即使我实施(据我所知)正确的解决方案 vijay m

,该程序仍然悬而未决

1 个答案:

答案 0 :(得分:0)

如错误所示,您正尝试将张量提供给feed_dict。您已定义input_pipeline个队列,但无法将其作为feed_dict传递。下面的代码显示了将数据传递给模型和训练的正确方法:

 # A queue which will return batches of inputs 
 batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size)

 # Feed it to your neural network model: 
 # Every time this is called, it will pull data from the queue.
 logits = neural_network(batch_x, batch_y, ...)

 # Define cost and optimizer
 cost = ...
 optimizer = ...

 # Evaluate the graph on a session:
 with tf.Session() as sess:
    init_op = ...
    sess.run(init_op)

    # Start the queues
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    # Loop through data and train
    for ( loop through steps ):
        _, cost = sess.run([optimizer, cost])

    coord.request_stop()
    coord.join(threads)