Question

假设我有一堆定义如下的摘要：

loss = ...
tf.scalar_summary("loss", loss)
# ...
summaries = tf.merge_all_summaries()

我可以在训练数据的每几步评估summaries张量，并将结果传递给SummaryWriter。结果将是嘈杂的摘要，因为它们仅在一个批次上计算。

但是，我想计算整个验证数据集的摘要。当然，我不能将验证数据集作为单个批次传递，因为它太大了。因此，我将获得每个验证批次的摘要输出。

有没有办法对这些摘要进行平均，以便看起来好像是在整个验证集上计算了摘要？

Answer 1

在Python中对您的度量进行平均，并为每个均值创建一个新的Summary对象。这是我的工作：

accuracies = []

# Calculate your measure over as many batches as you need
for batch in validation_set:
  accuracies.append(sess.run([training_op]))

# Take the mean of you measure
accuracy = np.mean(accuracies)

# Create a new Summary object with your measure
summary = tf.Summary()
summary.value.add(tag="%sAccuracy" % prefix, simple_value=accuracy)

# Add it to the Tensorboard summary writer
# Make sure to specify a step parameter to get nice graphs over time
summary_writer.add_summary(summary, global_step)

Answer 2

我会避免计算图表的平均。

您可以使用tf.train.ExponentialMovingAverage：

ema = tf.train.ExponentialMovingAverage(decay=my_decay_value, zero_debias=True)
maintain_ema_op = ema.apply(your_losses_list)

# Create an op that will update the moving averages after each training step.
with tf.control_dependencies([your_original_train_op]):
    train_op = tf.group(maintain_ema_op)

然后，使用：

sess.run(train_op)

这将调用maintain_ema_op，因为它被定义为控件依赖。

要获得指数移动平均线，请使用：

moving_average = ema.average(an_item_from_your_losses_list_above)

使用以下方法检索其值：

value = sess.run(moving_average)

这会计算计算图表中的移动平均值。

Answer 3

我认为让tensorflow进行计算总是更好。

查看流媒体指标。它们具有更新功能以提供当前批次的信息，并具有获取平均摘要的功能。看起来有点像这样：

accuracy = ... 
streaming_accuracy, streaming_accuracy_update = tf.contrib.metrics.streaming_mean(accuracy)
streaming_accuracy_scalar = tf.summary.scalar('streaming_accuracy', streaming_accuracy)

# set up your session etc. 

for i in iterations:
      for b in batches:
               sess.run([streaming_accuracy_update], feed_dict={...})

     streaming_summ = sess.run(streaming_accuracy_scalar)
     writer.add_summary(streaming_summary, i)

另请参阅tensorflow文档：https://www.tensorflow.org/versions/master/api_guides/python/contrib.metrics

这个问题： How to accumulate summary statistics in tensorflow

Answer 4

您可以平均存储当前总和并重新计算每批后的平均值，例如：

loss_sum = tf.Variable(0.)
inc_op = tf.assign_add(loss_sum, loss)
clear_op = tf.assign(loss_sum, 0.)
average = loss_sum / batches
tf.scalar_summary("average_loss", average)

sess.run(clear_op)
for i in range(batches):
    sess.run([loss, inc_op])

sess.run(average)

Answer 5

供将来参考，TensorFlow指标API现在默认支持此功能。例如，看看tf.mean_squared_error：

为了估计数据流上的度量，该函数创建了一个update_op操作，该操作将更新这些变量并返回mean_squared_error。在内部，squared_error操作将计算predictions和labels之间的差异的逐元素平方。然后update_op以total和weights的乘积之和减去squared_error，以count的减少之和递增weights。

这些total和count变量被添加到度量标准变量集中，因此在实践中您将执行以下操作：

x_batch = tf.placeholder(...)
y_batch = tf.placeholder(...)
model_output = ...
mse, mse_update = tf.metrics.mean_squared_error(y_batch, model_output)
# This operation resets the metric internal variables to zero
metrics_init = tf.variables_initializer(
    tf.get_default_graph().get_collection(tf.GraphKeys.METRIC_VARIABLES))
with tf.Session() as sess:
    # Train...
    # On evaluation step
    sess.run(metrics_init)
    for x_eval_batch, y_eval_batch in ...:
        mse = sess.run(mse_update, feed_dict={x_batch: x_eval_batch, y_batch: y_eval_batch})
    print('Evaluation MSE:', mse)

Answer 6

我自己找到了一个解决方案。我认为这有点hacky，我希望有一个更优雅的解决方案。

在设置过程中：

valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.scalar_summary("valid loss", valid_loss_placeholder)

或者对于0.12之后的tensorflow版本（tf.scalar_summary的名称更改）：

valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.summary.scalar("valid loss", valid_loss_placeholder)

在训练循环中：

# Compute valid loss in python by doing sess.run() for each batch
# and averaging
valid_loss = ...

summary = sess.run(valid_loss_summary, {valid_loss_placeholder: valid_loss})
summary_writer.add_summary(summary, step)

Answer 7

相当长的时间里，我每个时期只保存一次摘要。我从来不知道TensorFlows摘要只会保存最后运行的批次的摘要。

震惊，我调查了这个问题。这是我想出的解决方案（使用数据集API）：

loss = ...
train_op = ...

loss_metric, loss_metric_update = tf.metrics.mean(ae_loss)
tf.summary.scalar('loss', loss_metric)

merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(os.path.join(res_dir, 'train'))
test_writer = tf.summary.FileWriter(os.path.join(res_dir, 'test'))

init_local = tf.initializers.local_variables()
init_global = tf.initializers.global_variables()

sess.run(init_global)

def train_run(epoch):
    sess.run([dataset.train_init_op, init_local]) # test_init_op is the operation that switches to test data
    for i in range(dataset.num_train_batches): # num_test_batches is the number of batches that should be run for the test set
        sess.run([train_op, loss_metric_update])

    summary, cur_loss = sess.run([merged, loss_metric])
    train_writer.add_summary(summary, epoch)

    return cur_loss

def test_run(epoch):
    sess.run([dataset.test_init_op, init_local]) # test_init_op is the operation that switches to test data
    for i in range(dataset.num_test_batches): # num_test_batches is the number of batches that should be run for the test set
        sess.run(loss_metric_update)

    summary, cur_loss = sess.run([merged, loss_metric])
    test_writer.add_summary(summary, epoch)

    return cur_loss

for epoch in range(epochs):
    train_loss = train_run(epoch+1)
    test_loss = test_run(epoch+1)
    print("Epoch: {0:3}, loss: (train: {1:10.10f}, test: {2:10.10f})".format(epoch+1, train_loss, test_loss))

对于摘要，我只是将我感兴趣的张量包装到tf.metrics.mean()中。对于每个批处理运行，我都调用指标更新操作。在每个时期结束时，度量张量将返回所有批处理结果的正确平均值。

别忘了每次在训练和测试数据之间切换时都要初始化局部变量。否则，您的训练和测试指标将几乎相同。

Answer 8

当我意识到必须在内存空间狭窄和OOM错误泛滥时不得不遍历验证数据时，我遇到了同样的问题。

其中有多个答案，tf.metrics是内置的，但是我没有在项目中使用tf.metrics。受此启发，我做到了：

import tensorflow as tf
import numpy as np


def batch_persistent_mean(tensor):
    # Make a variable that keeps track of the sum
    accumulator = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
    # Keep count of batches in accumulator (needed to estimate mean)
    batch_nums = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
    # Make an operation for accumulating, increasing batch count
    accumulate_op = tf.assign_add(accumulator, tensor)
    step_batch = tf.assign_add(batch_nums, 1)
    update_op = tf.group([step_batch, accumulate_op])
    eps = 1e-5
    output_tensor = accumulator / (tf.nn.relu(batch_nums - eps) + eps)
    # In regards to the tf.nn.relu, it's a hacky zero_guard:
    # if batch_nums are zero then return eps, else it'll be batch_nums
    # Make an operation to reset
    flush_op = tf.group([tf.assign(accumulator, 0), tf.assign(batch_nums, 0)])
    return output_tensor, update_op, flush_op

# Make a variable that we want to accumulate
X = tf.Variable(0., dtype=tf.float32)
# Make our persistant mean operations
Xbar, upd, flush = batch_persistent_mean(X)

现在，您将Xbar发送到摘要中，例如tf.scalar_summary("mean_of_x", Xbar)，以及之前要做的sess.run(X)，您将做sess.run(upd)。而在每个纪元之间，您将进行sess.run(flush)。

测试行为：

### INSERT ABOVE CODE CHUNK IN S.O. ANSWER HERE ###
sess = tf.InteractiveSession()
with tf.Session() as sess:
    sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
    # Calculate the mean of 1+2+...+20
    for i in range(20):
        sess.run(upd, {X: i})
    print(sess.run(Xbar), "=", np.mean(np.arange(20)))
    for i in range(40):
        sess.run(upd, {X: i})
    # Now Xbar is the mean of (1+2+...+20+1+2+...+40):
    print(sess.run(Xbar), "=", np.mean(np.concatenate([np.arange(20), np.arange(40)])))
    # Now flush it
    sess.run(flush)
    print("flushed. Xbar=", sess.run(Xbar))
    for i in range(40):
        sess.run(upd, {X: i})
    print(sess.run(Xbar), "=", np.mean(np.arange(40)))

如何平均多批次的摘要？

8 个答案:

测试行为：