假设我有一堆定义如下的摘要:
loss = ...
tf.scalar_summary("loss", loss)
# ...
summaries = tf.merge_all_summaries()
我可以在训练数据的每几步评估summaries
张量,并将结果传递给SummaryWriter
。
结果将是嘈杂的摘要,因为它们仅在一个批次上计算。
但是,我想计算整个验证数据集的摘要。 当然,我不能将验证数据集作为单个批次传递,因为它太大了。 因此,我将获得每个验证批次的摘要输出。
有没有办法对这些摘要进行平均,以便看起来好像是在整个验证集上计算了摘要?
答案 0 :(得分:42)
在Python中对您的度量进行平均,并为每个均值创建一个新的Summary对象。这是我的工作:
accuracies = []
# Calculate your measure over as many batches as you need
for batch in validation_set:
accuracies.append(sess.run([training_op]))
# Take the mean of you measure
accuracy = np.mean(accuracies)
# Create a new Summary object with your measure
summary = tf.Summary()
summary.value.add(tag="%sAccuracy" % prefix, simple_value=accuracy)
# Add it to the Tensorboard summary writer
# Make sure to specify a step parameter to get nice graphs over time
summary_writer.add_summary(summary, global_step)
答案 1 :(得分:10)
我会避免计算图表的平均 。
您可以使用tf.train.ExponentialMovingAverage:
ema = tf.train.ExponentialMovingAverage(decay=my_decay_value, zero_debias=True)
maintain_ema_op = ema.apply(your_losses_list)
# Create an op that will update the moving averages after each training step.
with tf.control_dependencies([your_original_train_op]):
train_op = tf.group(maintain_ema_op)
然后,使用:
sess.run(train_op)
这将调用maintain_ema_op
,因为它被定义为控件依赖。
要获得指数移动平均线,请使用:
moving_average = ema.average(an_item_from_your_losses_list_above)
使用以下方法检索其值:
value = sess.run(moving_average)
这会计算计算图表中的移动平均值 。
答案 2 :(得分:8)
我认为让tensorflow进行计算总是更好。
查看流媒体指标。它们具有更新功能以提供当前批次的信息,并具有获取平均摘要的功能。 看起来有点像这样:
accuracy = ...
streaming_accuracy, streaming_accuracy_update = tf.contrib.metrics.streaming_mean(accuracy)
streaming_accuracy_scalar = tf.summary.scalar('streaming_accuracy', streaming_accuracy)
# set up your session etc.
for i in iterations:
for b in batches:
sess.run([streaming_accuracy_update], feed_dict={...})
streaming_summ = sess.run(streaming_accuracy_scalar)
writer.add_summary(streaming_summary, i)
另请参阅tensorflow文档:https://www.tensorflow.org/versions/master/api_guides/python/contrib.metrics
答案 3 :(得分:3)
您可以平均存储当前总和并重新计算每批后的平均值,例如:
loss_sum = tf.Variable(0.)
inc_op = tf.assign_add(loss_sum, loss)
clear_op = tf.assign(loss_sum, 0.)
average = loss_sum / batches
tf.scalar_summary("average_loss", average)
sess.run(clear_op)
for i in range(batches):
sess.run([loss, inc_op])
sess.run(average)
答案 4 :(得分:1)
供将来参考,TensorFlow指标API现在默认支持此功能。例如,看看tf.mean_squared_error
:
为了估计数据流上的度量,该函数创建了一个
update_op
操作,该操作将更新这些变量并返回mean_squared_error
。在内部,squared_error
操作将计算predictions
和labels
之间的差异的逐元素平方。然后update_op
以total
和weights
的乘积之和减去squared_error
,以count
的减少之和递增weights
。
这些total
和count
变量被添加到度量标准变量集中,因此在实践中您将执行以下操作:
x_batch = tf.placeholder(...)
y_batch = tf.placeholder(...)
model_output = ...
mse, mse_update = tf.metrics.mean_squared_error(y_batch, model_output)
# This operation resets the metric internal variables to zero
metrics_init = tf.variables_initializer(
tf.get_default_graph().get_collection(tf.GraphKeys.METRIC_VARIABLES))
with tf.Session() as sess:
# Train...
# On evaluation step
sess.run(metrics_init)
for x_eval_batch, y_eval_batch in ...:
mse = sess.run(mse_update, feed_dict={x_batch: x_eval_batch, y_batch: y_eval_batch})
print('Evaluation MSE:', mse)
答案 5 :(得分:0)
我自己找到了一个解决方案。我认为这有点hacky,我希望有一个更优雅的解决方案。
在设置过程中:
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.scalar_summary("valid loss", valid_loss_placeholder)
或者对于0.12之后的tensorflow版本(tf.scalar_summary的名称更改):
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.summary.scalar("valid loss", valid_loss_placeholder)
在训练循环中:
# Compute valid loss in python by doing sess.run() for each batch
# and averaging
valid_loss = ...
summary = sess.run(valid_loss_summary, {valid_loss_placeholder: valid_loss})
summary_writer.add_summary(summary, step)
答案 6 :(得分:0)
相当长的时间里,我每个时期只保存一次摘要。我从来不知道TensorFlows摘要只会保存最后运行的批次的摘要。
震惊,我调查了这个问题。这是我想出的解决方案(使用数据集API):
loss = ...
train_op = ...
loss_metric, loss_metric_update = tf.metrics.mean(ae_loss)
tf.summary.scalar('loss', loss_metric)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(os.path.join(res_dir, 'train'))
test_writer = tf.summary.FileWriter(os.path.join(res_dir, 'test'))
init_local = tf.initializers.local_variables()
init_global = tf.initializers.global_variables()
sess.run(init_global)
def train_run(epoch):
sess.run([dataset.train_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_train_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run([train_op, loss_metric_update])
summary, cur_loss = sess.run([merged, loss_metric])
train_writer.add_summary(summary, epoch)
return cur_loss
def test_run(epoch):
sess.run([dataset.test_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_test_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run(loss_metric_update)
summary, cur_loss = sess.run([merged, loss_metric])
test_writer.add_summary(summary, epoch)
return cur_loss
for epoch in range(epochs):
train_loss = train_run(epoch+1)
test_loss = test_run(epoch+1)
print("Epoch: {0:3}, loss: (train: {1:10.10f}, test: {2:10.10f})".format(epoch+1, train_loss, test_loss))
对于摘要,我只是将我感兴趣的张量包装到tf.metrics.mean()
中。对于每个批处理运行,我都调用指标更新操作。在每个时期结束时,度量张量将返回所有批处理结果的正确平均值。
别忘了每次在训练和测试数据之间切换时都要初始化局部变量。否则,您的训练和测试指标将几乎相同。
答案 7 :(得分:0)
当我意识到必须在内存空间狭窄和OOM错误泛滥时不得不遍历验证数据时,我遇到了同样的问题。
其中有多个答案,tf.metrics
是内置的,但是我没有在项目中使用tf.metrics
。受此启发,我做到了:
import tensorflow as tf
import numpy as np
def batch_persistent_mean(tensor):
# Make a variable that keeps track of the sum
accumulator = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Keep count of batches in accumulator (needed to estimate mean)
batch_nums = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Make an operation for accumulating, increasing batch count
accumulate_op = tf.assign_add(accumulator, tensor)
step_batch = tf.assign_add(batch_nums, 1)
update_op = tf.group([step_batch, accumulate_op])
eps = 1e-5
output_tensor = accumulator / (tf.nn.relu(batch_nums - eps) + eps)
# In regards to the tf.nn.relu, it's a hacky zero_guard:
# if batch_nums are zero then return eps, else it'll be batch_nums
# Make an operation to reset
flush_op = tf.group([tf.assign(accumulator, 0), tf.assign(batch_nums, 0)])
return output_tensor, update_op, flush_op
# Make a variable that we want to accumulate
X = tf.Variable(0., dtype=tf.float32)
# Make our persistant mean operations
Xbar, upd, flush = batch_persistent_mean(X)
现在,您将Xbar
发送到摘要中,例如tf.scalar_summary("mean_of_x", Xbar)
,以及之前要做的sess.run(X)
,您将做sess.run(upd)
。而在每个纪元之间,您将进行sess.run(flush)
。
### INSERT ABOVE CODE CHUNK IN S.O. ANSWER HERE ###
sess = tf.InteractiveSession()
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
# Calculate the mean of 1+2+...+20
for i in range(20):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(20)))
for i in range(40):
sess.run(upd, {X: i})
# Now Xbar is the mean of (1+2+...+20+1+2+...+40):
print(sess.run(Xbar), "=", np.mean(np.concatenate([np.arange(20), np.arange(40)])))
# Now flush it
sess.run(flush)
print("flushed. Xbar=", sess.run(Xbar))
for i in range(40):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(40)))