Question

我正试图在玩具问题上从头开始实施梯度下降算法。我的代码总是返回NaN的向量：

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(45)
x = np.linspace(0, 1000, num=1000)
y = 3*x + 2 + np.random.randn(len(x))

# sklearn output - This works (returns intercept = 1.6, coef = 3)
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y.reshape(-1, 1))
print("Intercept = {:.2f}, Coef = {:.2f}".format(lm.coef_[0][0], lm.intercept_[0]))

# BGD output
theta = np.array((0, 0)).reshape(-1, 1)
X = np.hstack([np.ones_like(x.reshape(-1, 1)), x.reshape(-1, 1)]) # [1, x]
Y = y.reshape(-1, 1) # Column vector
alpha = 0.05
for i in range(100):
    # Update: theta <- theta - alpha * [X.T][X][theta] - [X.T][Y]
    h = np.dot(X, theta) # Hypothesis
    loss = h - Y
    theta = theta - alpha*np.dot(X.T, loss)
theta

sklearn部分运行正常，所以我必须在for循环中做错事。我尝试了各种不同的alpha值，但没有一个收敛。

问题是theta在整个循环中越来越大，最终变得太大而无法存储python。

这是成本函数的等高线图：

J = np.dot((np.dot(X, theta) - y).T, (np.dot(X, theta) - y))
plt.contour(J)

显然这里没有最低限度。我哪里出错？

由于

Answer 1

在theta更新中，第二个术语应除以训练集的大小。有更多细节：gradient descent using python and numpy

渐变下降＆amp;线性回归 - 代码不收敛

1 个答案: