Question

我正在使用Tensorflow 1.2，这里是代码：

import tensorflow as tf
import tensorflow.contrib.layers as layers
import numpy as np
import tensorflow.contrib.learn as tflearn

tf.logging.set_verbosity(tf.logging.INFO)

# Naturally this is a very simple straight line
# of y = -x + 10
train_x = np.asarray([0., 1., 2., 3., 4., 5.])
train_y = np.asarray([10., 9., 8., 7., 6., 5.])

test_x = np.asarray([10., 11., 12.])
test_y = np.asarray([0., -1., -2.])

input_fn_train = tflearn.io.numpy_input_fn({"x": train_x}, train_y, num_epochs=1000)
input_fn_test = tflearn.io.numpy_input_fn({"x": test_x}, test_y, num_epochs=1000)

validation_monitor = tflearn.monitors.ValidationMonitor(
    input_fn=input_fn_test,
    every_n_steps=10)

fts = [layers.real_valued_column('x')]

estimator = tflearn.LinearRegressor(feature_columns=fts)
estimator.fit(input_fn=input_fn_train,
              steps=1000,
              monitors=[validation_monitor])

print(estimator.evaluate(input_fn=input_fn_test))

按预期运行。发生的事情是，训练在第47步停止，损失率非常高：

INFO:tensorflow:Starting evaluation at 2017-06-18-20:52:10
INFO:tensorflow:Finished evaluation at 2017-06-18-20:52:10
INFO:tensorflow:Saving dict for global step 1: global_step = 1, loss = 12.5318
INFO:tensorflow:Validation (step 10): global_step = 1, loss = 12.5318
INFO:tensorflow:Saving checkpoints for 47 into    
INFO:tensorflow:Loss for final step: 19.3527.
INFO:tensorflow:Starting evaluation at 2017-06-18-20:52:11
INFO:tensorflow:Restoring parameters from   
INFO:tensorflow:Finished evaluation at 2017-06-18-20:52:11
INFO:tensorflow:Saving dict for global step 47: global_step = 47, loss = 271.831

{'global_step': 47, 'loss': 271.83133}

我完全不理解的事情（不可否认我是TF中的一个完整的菜鸟）：

为什么第10步的损失小于第47步的损失？
为什么TF决定在之后停止训练？
为什么＆＃34; INFO：tensorflow：最后一步的损失：19.3527。＆＃34;并且步骤47中的损失彼此不匹配？

我使用vanilla TensorFlow实现了这个非常算法，并且按预期工作，但我真的无法掌握LinearRegressor在这里想要的东西。

Answer 1

以下是您的问题的部分（部分）答案。可能无法解决您的所有问题，但希望能为您提供更多见解。

为什么TF决定在此之后停止训练？这与您设置num_epochs = 1000并且numpy_input_fn的默认batch_size为128（请参阅https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/learn_io/numpy_io.py）这一事实有关。 num_epochs = 1000表示fit方法最多可以遍历数据1000次（或1000步，以先发生者为准）。这就是为什么适合天花板（1000 * 6/128）= 47步。将batch_size设置为6（训练数据集的大小）或num_epochs = None将为您提供更合理的结果（我建议将batch_size设置为最多6个，因为在一个步骤中多次循环使用训练样本可能没有多大意义）
为什么第10步的损失小于第47步的损失？损失可能没有减少的原因有几个。一个。不计算每一步完全相同数据的损失。例如，如果您的样本大小为100且batch_size为32，那么您将在下一批大小为32的每一步计算损失（这将继续循环）湾你的学习率太高，所以损失会反弹。要解决这个问题，可以尝试降低学习率，甚至尝试不同的优化器。我相信默认情况下，LinearRegressor中使用的优化器是FtrlOptimizer。构造LinearRegressor时，可以使用以下命令更改其默认学习速率：

estimator = tflearn.LinearRegressor（ feature_columns = FTS，优化= tf.train.FtrlOptimizer（learning_rate = ...））

或者，您可以尝试完全不同的优化器。 estimator = tflearn.LinearRegressor（ feature_columns = FTS，优化= tf.train.GradientDescentOptimizer（learning_rate = ...））

TensorFlow / TFLearn LinearRegressor以非常高的损失停止

1 个答案: