Question

我正在学习斯坦福大学的课程，以及＃34; TensorFlow for Deep Learning Research＆＃34;。我已经从以下address获取了代码。在探索tensorflow时我改变了

Y_predicted = X * w + b

作为

Y_predicted = X * X * w + X * u + b

检查非线性曲线是否更好。我添加了

Y_predicted = X * X * w + X * u + b

根据作者对此note(page 3)的建议。但在添加此行并再次运行类似代码后，每个错误值似乎都会 nan 。任何人都可以指出问题并提出解决方案。

""" Simple linear regression example in TensorFlow
This program tries to predict the number of thefts from 
the number of fire in the city of Chicago
Author: Chip Huyen
Prepared for the class CS 20SI: "TensorFlow for Deep Learning Research"
cs20si.stanford.edu
"""
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd

#import utils

DATA_FILE = "slr05.xls"

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft)
X = tf.placeholder(tf.float32, name='X')
Y = tf.placeholder(tf.float32, name='Y')

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name='weights')
u = tf.Variable(0.0, name='weights2')
b = tf.Variable(0.0, name='bias')

# Step 4: build model to predict Y
#Y_predicted = X * w + b 
Y_predicted = X *  X *  w +  X *  u +  b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name='loss')
# loss = utils.huber_loss(Y, Y_predicted)

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:
    # Step 7: initialize the necessary variables, in this case, w and b
    sess.run(tf.global_variables_initializer()) 

    writer = tf.summary.FileWriter('./graphs/linear_reg', sess.graph)

    # Step 8: train the model
    for i in range(100): # train the model 100 epochs
        total_loss = 0
        for x, y in data:
            # Session runs train_op and fetch values of loss
            _, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y}) 
            total_loss += l
        print('Epoch {0}: {1}'.format(i, total_loss/n_samples))

    # close the writer when you're done using it
    writer.close() 

    # Step 9: output the values of w and b
    w, u , b = sess.run([w, u , b]) 

# plot the results
X, Y = data.T[0], data.T[1]
plt.plot(X, Y, 'bo', label='Real data')
plt.plot(X, X * x * w + X * u + b, 'r', label='Predicted data')
plt.legend()
plt.show()

Answer 1

糟糕！你的学习率似乎太大了，试试类似learning_rate=0.0000001的东西，它会收敛。这是一个常见问题，特别是在您引入交互功能时，如您的情况：您应该记住x**2的范围会更大（如果原始值为[-100] [100]二次方将是[-10000,10000]），因此对于多项式，对于线性模型而言良好工作的学习率可能太大。查看feature scaling。这张照片给出了一个更直观的解释：

希望它有所帮助！安德烈

Answer 2

我是那门课程的人。就像@fr_andres说的那样，你的lr可能太大了。如果这不起作用，请告诉我。

渐变下降不起作用

2 个答案: