使用输入管道时如何替换feed_dict?

时间:2018-10-01 12:55:26

标签: python python-3.x tensorflow

假设到目前为止,您已经有一个与feed_dict合作的网络可以将数据注入图形。每隔几个纪元,我都会通过将一批数据从任一数据集中输入到我的图表中来评估训练和测试损失。

现在,出于性能原因,我决定使用输入管道。看一下这个虚拟的例子:

import tensorflow as tf
import numpy as np

dataset_size = 200
batch_size= 5
dimension = 4

# create some training dataset
dataset = tf.data.Dataset.\
    from_tensor_slices(np.random.normal(2.0,size=(dataset_size,dimension)).
    astype(np.float32))

dataset = dataset.batch(batch_size) # take batches

iterator = dataset.make_initializable_iterator()
x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))

loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # train one epoch
    sess.run(iterator.initializer)
    for i in range(dataset_size//batch_size):
        # the training step will update the weights based on ONE batch of examples each step
        loss1,_ = sess.run([loss,train_op])
        print('train step {:d}.  batch loss {:f}.'.format(i,loss1))

        # I want to print the loss from another dataset (test set) here

打印出训练数据的丢失是没有问题的,但是我该如何对另一个数据集执行此操作呢?当使用feed_dict时,我只是从上述数据集中得到了一批并喂了它x的值。

1 个答案:

答案 0 :(得分:2)

您可以为此做几件事。一个简单的选择可能是具有两个数据集和迭代器,并使用tf.cond在它们之间进行切换。但是,更强大的方法是使用直接支持此功能的迭代器。有关各种迭代器类型的说明,请参见how to create iterators上的指南。例如,使用可重新初始化的迭代器,您可能会遇到以下情况:

import tensorflow as tf
import numpy as np

dataset_size = 200
dataset_test_size = 20
batch_size= 5
dimension = 4

# create some training dataset
dataset = tf.data.Dataset.\
    from_tensor_slices(np.random.normal(2.0,size=(dataset_size,dimension)).
    astype(np.float32))

dataset = dataset.batch(batch_size) # take batches

# create some test dataset
dataset_test = tf.data.Dataset.\
    from_tensor_slices(np.random.normal(2.0,size=(dataset_test_size,dimension)).
    astype(np.float32))

dataset_test = dataset_test.batch(batch_size) # take batches

iterator = tf.data.Iterator.from_structure(dataset.output_types,
                                           dataset.output_shapes)

dataset_init_op = iterator.make_initializer(dataset)
dataset_test_init_op = iterator.make_initializer(dataset_test)

x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))

loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # train one epoch
    sess.run(dataset_init_op)
    for i in range(dataset_size//batch_size):
        # the training step will update the weights based on ONE batch of examples each step
        loss1,_ = sess.run([loss,train_op])
        print('train step {:d}.  batch loss {:f}.'.format(i,loss1))

    # print test loss
    sess.run(dataset_test_init_op)
    for i in range(dataset_test_size//batch_size):
        loss1 = sess.run(loss)
        print('test step {:d}.  batch loss {:f}.'.format(i,loss1))

您可以使用可喂食的迭代器执行类似的操作,这取决于您发现更方便的方法,而且我想即使使用可初始化的迭代器,例如创建布尔数据集,然后使用tf.cond映射到某些数据,尽管那不是很自然的方式。


编辑:

在这里,您可以使用可初始化的迭代器来实现此目的,实际上它比我最初的想法更干净,所以也许您实际上更喜欢这样:

import tensorflow as tf
import numpy as np

dataset_size = 200
dataset_test_size = 20
batch_size= 5
dimension = 4

# create data
data = tf.constant(np.random.normal(2.0,size=(dataset_size,dimension)), tf.float32)
data_test = tf.constant(np.random.normal(2.0,size=(dataset_test_size,dimension)), tf.float32)
# choose data
testing = tf.placeholder_with_default(False, ())
current_data = tf.cond(testing, lambda: data_test, lambda: data)
# create dataset
dataset = tf.data.Dataset.from_tensor_slices(current_data)
dataset = dataset.batch(batch_size)
# create iterator
iterator = dataset.make_initializable_iterator()

x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))

loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # train one epoch
    sess.run(iterator.initializer)
    for i in range(dataset_size//batch_size):
        # the training step will update the weights based on ONE batch of examples each step
        loss1,_ = sess.run([loss,train_op])
        print('train step {:d}.  batch loss {:f}.'.format(i,loss1))

    # print test loss
    sess.run(iterator.initializer, feed_dict={testing: True})
    for i in range(dataset_test_size//batch_size):
        loss1 = sess.run(loss)
        print('test step {:d}.  batch loss {:f}.'.format(i,loss1))