tensorflow程序卡在变量中

时间:2017-11-06 17:26:57

标签: python numpy tensorflow

我正在研究Tensorflow并遇到了一些问题。我想在我试图近似2x+2z-3t=y时得到最小化损失函数(得到a,b,c值,其中a = 2,b = 2,c = -3)但它不起作用。我的错误在哪里?

这是我的输出:

a: [ 0.51013279] b: [ 0.51013279] c: [ 1.00953674] loss: 2.72952e+10 

我需要:2 b:2 c:-3且损失接近0

import tensorflow as tf
import numpy as np

a = tf.Variable([1], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
c = tf.Variable([0], dtype=tf.float32)

x = tf.placeholder(tf.float32)
z = tf.placeholder(tf.float32)
t = tf.placeholder(tf.float32)
linear_model = a * x + b * z + c * t
y = tf.placeholder(tf.float32)

loss = tf.reduce_sum(tf.square(linear_model - y))  # sum of the squares

optimizer = tf.train.GradientDescentOptimizer(0.01)
 train = optimizer.minimize(loss)

x_train = np.arange(0, 5000, 1)
z_train = np.arange(0, 10000, 2)
t_train = np.arange(0, 5000, 1)
y_train = list(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train, 
t_train))

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(10000):
sess.run(train, {x: x_train, z: z_train, t: t_train, y: y_train})

curr_a, curr_b, curr_c, curr_loss = sess.run([a, b, c, loss], {x: x_train, 
z: z_train, t: t_train, y: y_train})
print("a: %s b: %s c: %s loss: %s" % (curr_a, curr_b, curr_c, curr_loss))

我稍微改变了Maxim的代码,看看a,b,c的值是这样的:

_, loss_val, curr_a, curr_b, curr_c, model_val = sess.run([optimizer, 
loss,a, b, c, linear model],           {x: x_train, z: z_train, t: t_train, 
y: y_train})

所以我的输出是:

10 2.04454e-11 1.83333 0.666667 -0.166667

20 2.04454e-11 1.83333 0.666667 -0.166667

30 2.04454e-11 1.83333 0.666667 -0.166667

我预计a = 2,b = 2,c = -3

1 个答案:

答案 0 :(得分:1)

首先,没有单一解决方案,因此优化器可以收敛到任何一个局部最小值。确切的值很大程度上取决于变量的初始化。

关于您的错误的简短回答:小心学习率。查看我的代码版本:

a = tf.Variable(2, dtype=tf.float32)
b = tf.Variable(1, dtype=tf.float32)
c = tf.Variable(0, dtype=tf.float32)

x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
z = tf.placeholder(shape=[None, 1], dtype=tf.float32)
t = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y = tf.placeholder(shape=[None, 1], dtype=tf.float32)

linear_model = a * x + b * z + c * t
loss = tf.reduce_mean(tf.square(linear_model - y))  # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)

n = 50
x_train = np.arange(0, n, 1).reshape([-1, 1])
z_train = np.arange(0, 2*n, 2).reshape([-1, 1])
t_train = np.arange(0, n, 1).reshape([-1, 1])
y_train = np.array(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train, t_train)).reshape([-1, 1])

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())

  for i in range(101):
    _, loss_val = sess.run([optimizer, loss], {x: x_train, z: z_train, t: t_train, y: y_train})
    if i % 10 == 0:
      a_val, b_val, c_val = sess.run([a, b, c])
      print('iteration %2i, loss=%f a=%.5f b=%.5f c=%.5f' % (i, loss_val, a_val, b_val, c_val))

如果你运行它,你会发现它收敛得非常快 - 只需不到10次迭代。但是,如果您将培训大小n50增加到75,则模型将会出现分歧。但降低学习率0.00001会使其再次收敛,但不会像以前那么快。您推送到优化器的数据越多,合适的学习率就越重要。

您已经尝试过5000训练大小:我甚至无法想象学习率应该在多大程度上正确处理多个点。