我试图通过实施二次线性回归来跟踪TF上Stanford系列的例子。
Y = W*X*X + u*X + b
数据集可以在Cengage数据集中找到;代码如下:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd
DATA = 'data\\slr05.xls'
# Read data
data = xlrd.open_workbook(DATA, encoding_override='utf-8')
sheet = data.sheet_by_index(0)
dataset = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1
X = tf.placeholder('float', name = 'X')
Y = tf.placeholder('float', name = 'Y')
W = tf.Variable(0.0, name = 'weights')
b = tf.Variable(0.0, name = 'bias')
u = tf.Variable(0.0, name = 'u_weight')
Y_ = X*X*W + X*u + b
loss = tf.square(Y - Y_, name = 'loss')
optimizer = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
init = tf.global_variables_initializer()
loss_average = []
# Start the Session
with tf.Session() as sess:
sess.run(init)
for i in range(10):
for x, y in dataset:
print(sess.run([optimizer, Y_, W, b, u, X, Y], feed_dict = {X:x, Y:y}))
loss_average.append(sess.run(loss, feed_dict = {X:x, Y:y}))
我得到的最终W,b和u值是nan
。我试图逐步检查为什么会发生这种情况。因此,在下面的输出中,我添加了[optimizer, Y_, W, b, u, X, Y]
经过几次迭代后,我得到了:
[None, 3.9304674e+33, -1.0271335e+33, -7.7725354e+29, -2.8294217e+31, 36.2, 41.]
[None, -1.619979e+36, inf, 3.2321854e+32, 1.2834338e+34, 39.7, 147]
显然,在优化过程中,W
最终会结束' inf'这会打破回归输出。
任何,知道我做错了什么?
答案 0 :(得分:2)
这里有一个爆炸性的梯度问题。这是因为您的 X 和 Y ,因此差异值的大小为 10 1 ,所以方差(你的损失)幅度 10 2 。当您将 X 2 引入回归时,您的差异值将为 10 2 ,它们的正方形 10 4 。因此,梯度会更大,网络也会剧烈发散。
要纠正此问题,您可以将学习率降低 10 -3 ,将渐变大致放回原处他们是,并且看到这段代码(经过测试):
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd
DATA = 'slr05.xls'
# Read data
data = xlrd.open_workbook(DATA, encoding_override='utf-8')
sheet = data.sheet_by_index(0)
dataset = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1
X = tf.placeholder('float', name = 'X')
Y = tf.placeholder('float', name = 'Y')
W = tf.Variable(0.0, name = 'weights')
b = tf.Variable(0.0, name = 'bias')
u = tf.Variable(0.0, name = 'u_weight')
Y_ = X*X*W + X*u + b
#Y_ = X * u + b
loss = tf.square(Y - Y_, name = 'loss')
optimizer = tf.train.GradientDescentOptimizer(0.0000001).minimize(loss)
init = tf.global_variables_initializer()
loss_average = []
# Start the Session
with tf.Session() as sess:
sess.run(init)
for i in range(10):
for x, y in dataset:
print(sess.run([optimizer, loss, Y_, W, b, u, X, Y], feed_dict = {X:x, Y:y}))
loss_average.append(sess.run(loss, feed_dict = {X:x, Y:y}))
将顺从有序地收敛,就像好网络一样,输出(仅限最后5行):
[无,1313.2705,9.760924,0.06911032,0.0014081484,0.010015297,数组(11.9,dtype = float32),数组(46。,dtype = float32)]
[无,1174.7083,7.7259817,0.06986606,0.0014150032,0.010087272,数组(10.5,dtype = float32),数组(42.,dtype = float32)]
[无,1217.4297,8.1083145,0.07066501,0.0014219815,0.01016194,数组(10.7,dtype = float32),数组(43。,dtype = float32)]
[无,657.74097,8.353538,0.07126329,0.0014271108,0.010217336,数组(10.8,dtype = float32),数组(34。,dtype = float32)]
[无,299.5538,1.6923765,0.07134304,0.0014305722,0.010233952,数组(4.8,dtype = float32),数组(19。,dtype = float32)]