我为我的Image数据创建了一个TFRecord文件,我可以加载它并用它训练我的网络。
height = 28
width = 28
tfrecords_train_filename = '../train-00000-of-00001'
tfrecords_test_filename = '../test-00000-of-00001'
def read_and_decode(filename_queue):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features={
'image/class/label': tf.FixedLenFeature([], tf.int64),
'image/encoded': tf.FixedLenFeature([], dtype=tf.string, default_value='')
})
image_buffer = features['image/encoded']
image_label = tf.cast(features['image/class/label'], tf.int32)
with tf.name_scope('decode_jpeg', [image_buffer], None):
image = tf.image.decode_jpeg(image_buffer, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.image.rgb_to_grayscale(image)
image_shape = tf.stack([height, width, 1])
image = tf.reshape(image, image_shape)
return image, image_label
def inputs(filename, batch_size, num_epochs):
if not num_epochs: num_epochs = None
with tf.name_scope('input'):
filename_queue = tf.train.string_input_producer([filename], num_epochs=None)
image, label = read_and_decode(filename_queue)
images, sparse_labels = tf.train.shuffle_batch(
[image, label], batch_size=batch_size, num_threads=2,
capacity=1000 + 3 * batch_size,
min_after_dequeue=1000)
return images, sparse_labels
image, label = inputs(filename=tfrecords_train_filename, batch_size=200, num_epochs=None)
image = tf.reshape(image, [-1, 784])
label = tf.one_hot(label - 1, 10)
# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for i in range(1000):
img, lbl = sess.run([image, label])
sess.run(train_step, feed_dict={x: img, y_: lbl})
img, lbl = sess.run([image, label])
print(sess.run(accuracy, feed_dict={x: img, y_: lbl}))
coord.request_stop()
coord.join(threads)
第一个功能基本上是加载TFRecord文件并将数据转换回图像数据。然后在inputs
中,数据被混洗成批。
我现在希望在培训期间定期在网络上评估培训数据。为此,我希望有类似于test_image, test_label = inputs(filename=tfrecords_test_filename, batch_size=20, num_epochs=None)
的东西。但是,它似乎覆盖了我之前定义的队列,因此抛出了OutOfRangeError。
我正在阅读有关使用共享变量执行此操作的可能性,但我不知道如何使用共享变量。它甚至是正确的方式吗?如何定期评估网络?
答案 0 :(得分:3)
我最终做的是将输入和read_and_decode合并为一个函数:
class foo {}
class bar extends foo {}
is_subclass_of('bar', 'foo'); // true
然后像这样处理数据集:
def _parse_function(proto):
features={
'image/class/label': tf.FixedLenFeature([], tf.int64),
'image/encoded': tf.FixedLenFeature([], dtype=tf.string,
default_value='')
}
parsed_features = tf.parse_single_example(proto, features)
image_buffer = parsed_features['image/encoded']
image_label = tf.cast(parsed_features['image/class/label'], tf.int32)
with tf.name_scope('decode_jpeg', [image_buffer], None):
image = tf.image.decode_jpeg(image_buffer, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.image.rgb_to_grayscale(image)
image_shape = tf.stack([height, width, 1])
image = tf.reshape(image, image_shape)
image = tf.reshape(image, [784])
image_label = tf.one_hot(image_label - 1, 10)
return image, image_label
这是将TFRecord数据导入数据集的一种非常方便的方法。然后,我可以在培训期间简单地切换:
# Training Dataset
train_dataset = tf.contrib.data.TFRecordDataset(['train'])
# Parse the record into tensors.
train_dataset = train_dataset.map(_parse_function)
train_dataset = train_dataset.shuffle(buffer_size=10000)
train_dataset = train_dataset.batch(200)
# Validation Dataset
validation_dataset = tf.contrib.data.TFRecordDataset(['validation'])
validation_dataset = validation_dataset.map(_parse_function)
validation_dataset = validation_dataset.batch(200)
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.contrib.data.Iterator.from_string_handle(handle,
train_dataset.output_types, train_dataset.output_shapes)
next_element = iterator.get_next()
training_iterator = train_dataset.make_initializable_iterator()
validation_iterator = validation_dataset.make_one_shot_iterator()
这个实现中仍然缺少的是,我没有在整个评估集上运行。然而,这只是一个小修改。
答案 1 :(得分:1)
查看feedable iterators here部分。我想这可能就是你要找的东西。这是使用Dataset
API,但我认为它与TFRecord
API相似。我对此并不积极。
要点主要来自之前链接的文档:
# Define training and test datasets with the same structure.
training_data = tf.contrib.data.Dataset.(whatever)
test_data = tf.contrib.data.Dataset.(something_else)
# Feedable iterators use a handle placeholder.
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.contrib.data.Iterator.from_string_handle(
handle,
training_data.output_types,
training_data.output_shapes)
next_element = iterator.get_next()
# You need iterators for each dataset to feed your feedable iterator.
# This gets a little wonky.
training_iterator = training_data.make_one_shot_iterator()
test_iterator = test_data.make_initiailizable_iterator()
# Use `Iterator.string_handle()` to get the value for your `handle`
# placeholder.
training_handle = sess.run(training_iterator.string_handle())
test_handle = sess.run(test_iterator.string_handle())
# Finally run your training/testing. Say you want to train for 100
# steps, then test for 50 iterations, then repeat 10 times. And you
# want to reset your test iterator with every outer loop.
for _ in range(10):
for _ in range(100):
sess.run(next_element, feed_dict={handle: training_handle})
sess.run(test_iterator.initializier)
for _ in range(50):
sess.run(next_element, feed_dict={handle: test_handle})
再看一下,我不确定这会对你有所帮助。我会把它留下,直到我听到反馈。