在卷积神经网络训练期间定期评估大型测试集

时间:2016-06-01 09:38:33

标签: python machine-learning tensorflow deep-learning

我用TensorFlow创建了一个小卷积神经网络,我想训练它。

在培训期间,我想记录几个指标。其中之一是独立于训练集的测试集的准确性。

MNIST示例向我展示了如何做到这一点:

  # Train the model, and also write summaries.
  # Every 10th step, measure test-set accuracy, and write test summaries
  # All other steps, run train_step on training data, & add training summaries

  def feed_dict(train):
    """Make a TensorFlow feed_dict: maps data onto Tensor placeholders."""
    if train or FLAGS.fake_data:
      xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data)
      k = FLAGS.dropout
    else:
      xs, ys = mnist.test.images, mnist.test.labels
      k = 1.0
    return {x: xs, y_: ys, keep_prob: k}

  for i in range(FLAGS.max_steps):
    if i % 10 == 0:  # Record summaries and test-set accuracy
      summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
      test_writer.add_summary(summary, i)
      print('Accuracy at step %s: %s' % (i, acc))
    else: # Record train set summarieis, and train
      summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True))
      train_writer.add_summary(summary, i)

它的作用是每10步将整个测试集输入评估中,并打印出这种准确性。

这很酷,但我的测试设置相当大。我有大约2000张尺寸为30x30x30x8的“图像”,因此将所有这些数据集立即投入到评估中会破坏我的核心内存和GPU内存。

作为一种解决方法,我有这个:

accuracy = mymodel.accuracy(logits, label_placeholder)

test_accuracy_placeholder = tf.placeholder(tf.float32, name="test_accuracy")
test_summary = tf.scalar_summary("accuracy", test_accuracy_placeholder)


# training loop
for batch_idx in enumerate(batches_in_trainset):

    #do training here
    ...

    # check accuracy every 10 examples
    if batch_idx % 10 == 0:

        test_accuracies = []  # start with empty accuracy list

        # inner testing loop
        for test_batch_idx in range(batches_in_testset):
            # get testset batch
            labels, images = testset.next_batch()

            # make feed dict
            feed_dict = {
                # ...
            }

            # calculate accuracy
            test_accuracy_val = sess.run(accuracy, feed_dict=test_feed_dict)

            # append accuracy to the list of test accuracies
            test_accuracies.append(test_accuracy_val)

        # "calculate" and log the average accuracy over all test batches
        summary_str = sess.run(test_summary,
                               feed_dict={
                                   test_accuracy_placeholder: sum(test_accuracies) / len(test_accuracies)})

        test_writer.add_summary(summary_str)

基本上,我首先收集测试集批次的所有准确度,然后将它们输入到第二个(断开连接的)图表中,以计算这些批次的平均值。

这种“种类”的作用,在某种意义上说,我确实能够在所需的时间间隔内计算出测试集的准确性。

然而,这感觉非常尴尬,并且有一个严重的缺点,即除了测试集的准确性之外我不能记录任何其他内容。

例如,我想在整个测试集上记录损失函数值,在整个测试集上记录激活histograsm,也可能记录其他一些变量。

最好这应该像在MNIST示例中一样工作。在这里查看TensorBoard演示:https://www.tensorflow.org/tensorboard/index.html#events

在本摘要中,所有变量和指标在测试和训练集上进行 评估。我也要那个!但是我想要的却没有以某种方式将完整的测试集提供给我的模型。

1 个答案:

答案 0 :(得分:0)

看起来此功能已添加了流量指标评估(contrib)。

https://www.tensorflow.org/api_guides/python/contrib.metrics