最大流量损失减少会增加损失

时间:2017-03-31 16:48:35

标签: tensorflow training-data gradient-descent tensorboard

我实施了Tensorflow主页上显示的线性回归模型:https://www.tensorflow.org/get_started/get_started

import numpy as np
import tensorflow as tf

# Model parameters
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [1,2,3,4]
y_train = [0,-1,-2,-3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
  sess.run(train, {x:x_train, y:y_train})

# evaluate training accuracy
curr_W, curr_b, curr_loss  = sess.run([W, b, loss], {x:x_train, y:y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

但是,当我将训练数据更改为x_train = [2,4,6,8]和y_train = [3,4,5,6]时, 损失开始随着时间的推移而增加,直到达到' nan'

1 个答案:

答案 0 :(得分:0)

正如Steven所建议的那样,你应该使用reduce_mean(),这似乎解决了增加损失函数的问题。请注意,我还增加了训练步骤的数量,因为reduce_mean()似乎需要更长的时间才能收敛。小心提高学习率,因为这可能会重现问题。相反,如果训练时间不是关键因素,您可能希望降低学习率并进一步增加训练迭代次数。

使用reduce_sum()函数,在将学习率从0.01降低到0.001后,它对我来说效果很好。再次感谢Steven提出建议。

import numpy as np
import tensorflow as tf

# Model parameters
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [2,4,6,8]
y_train = [0,3,4,5]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(5000):
    sess.run(train, {x:x_train, y:y_train})

    # evaluate training accuracy
    curr_W, curr_b, curr_loss  = sess.run([W, b, loss], {x:x_train, y:y_train})
    print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))