x_data = np.array([0.35252703, 0.60587817, 0.8906856, 0.4813087, 0.53391305, 0.27751151])
y_data = x_data * 10
b = tf.Variable(0.)
k = tf.Variable(0.)
y = k * x_data + b
loss = tf.reduce_mean(tf.square(y_data - y))
optimizer = tf.train.GradientDescentOptimizer(0.2)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(400):
sess.run(train)
if (step % 40 == 0):
print(step, sess.run([k, b]))
这个输出是:
0 [1.2522789,2.0945494]
40 [5.304193,2.5357442]
80 [7.116992,1.5568293]
120 [8.229965,0.9565825]
160 [8.913281,0.58682966]
200 [9.332804,0.36028674]
240 [9.590374,0.22119847]
280 [9.748508,0.13580598]
320 [9.845596,0.083378375]
360 [9.905204,0.0511902]
这很不错。然后我用这种方式改变了数据:
x_data = np.array([352.52703, 605.87817, 0.8906856, 0.4813087, 0.53391305, 0.27751151])
然后输出为
0 [327576.72,640.39246]
40 [nan,nan]
80 [nan,nan]
120 [nan,nan]
160 [nan,nan]
200 [nan,nan]
240 [nan,nan]
280 [nan,nan]
320 [nan,nan]
360 [nan,nan]
任何人都可以告诉我为什么第二个输出是这个?
答案 0 :(得分:1)
只需将学习率设置得更小。
我将学习率设置为1e-5,并且工作正常。
`
(0, [16.378834, 0.032019623])
(40, [9.9999628, 0.019538468])
(80, [9.9999628, 0.019527739])
(120, [9.9999628, 0.01951701])
(160, [9.9999628, 0.019506281])
(200, [9.9999628, 0.019495552])
(240, [9.9999628, 0.019484824])
(280, [9.9999628, 0.019474095])
(320, [9.9999628, 0.019463366])
(360, [9.9999628, 0.019452637])
`
你将k和b初始化为0,初始渐变很大,学习率很大,所以它恰好与正确答案相反。