我尝试用梯度下降来实现线性回归,但是我的错误发散到无穷大。我已经阅读了我的代码,仍然无法弄清楚我哪里出错了。我希望有人可以帮我调试为什么线性回归的实现不起作用。
当N=100
然后没有问题,但是当N=1000
时,观察到向无穷大的分歧。
import numpy as np
class Regression:
def __init__(self, xs, ys, w,alpha):
self.w = w
self.xs = xs
self.ys = ys
self.a = alpha
self.N = float(len(xs))
def error(self, ys, yhat):
return (1./self.N)*np.sum((ys-yhat)**2)
def propagate(self):
yhat = xs*w[0]+w[1]
loss = yhat - self.ys
r1 = (2./self.N)*np.sum(loss*self.xs)
r2 = (2./self.N)*np.sum(loss)
self.w[0] -= self.a*r1
self.w[1] -= self.a*r2
N = 600
xs = np.arange(0,N)
bias = np.random.sample(size=N)*10
ys = xs * 2. + 2. + bias
ws = np.array([0.,0.])
regressor = Regression(
xs, ys, ws,
0.00001)
for i in range(1000):
regressor.propagate()
输出:
...
2.71623180177e+286
5.27841816362e+286
1.02574818143e+287
1.99332318715e+287
3.87359919362e+287
7.52751526171e+287
1.46281231441e+288
2.84266426942e+288
5.52411274435e+288
1.07349369184e+289
2.0861064206e+289
4.05390365232e+289
7.87789858657e+289
1.5309018532e+290
2.97498179035e+290
5.78124367308e+290
1.12346161297e+291
2.18320843611e+291
4.24260074438e+291
8.2445912074e+291
1.6021607564e+292
3.11345829619e+292
6.05034327761e+292
1.17575539141e+293
2.28483026006e+293
4.4400811218e+293
8.62835227315e+293
答案 0 :(得分:3)
当您增加r1
时,起点r2
处的渐变分量w=[0,0]
和N
分别与N
呈二次和线性关系。对于足够大的w
,向量w
的初始步长变得大于其误差的两倍,这导致校正过冲并且实际上增加错误。正反馈导致alpha
在正确值周围振荡,振幅不断增大而不是收敛。
如果您将N=1000
缩小十倍,您会发现{{1}}会收敛。
答案 1 :(得分:1)
您已超出方法的收敛半径。我在传播:
的底部添加了一个打印语句来跟踪效果 self.w = np.array(res).astype(np.float)
print self.error(ys, yhat), '\t', r1, '\t', r2, '\t', self.w
作为K.A. Buhr指出,r1与 N 呈二次比例。根据输入选择学习率;它不是SGD算法的承诺常量。这是前20次迭代的输出,N = 600,如代码所示:
486826.997899 -482786.592791 -1211.52883528 [ 4.82786593 0.01211529]
946024.542374 673013.376697 1680.38708612 [-1.90226784 -0.00468858]
1838377.19732 -938192.956012 -2350.99664804 [ 7.47966172 0.01882138]
3572474.5816 1307858.19046 3268.82617841 [-5.59892018 -0.01386688]
6942323.62211 -1823178.2573 -4565.30975898 [ 12.63286239 0.03178622]
13490907.7204 2541543.91414 6355.61930844 [-12.78257675 -0.03176997]
26216686.5837 -3542958.75828 -8868.35584965 [ 22.64701083 0.05691359]
50946528.2176 4938949.44036 12354.1444796 [-26.74248357 -0.06662786]
99003709.9274 -6884985.98436 -17230.4097511 [ 42.10737627 0.10567624]
192392610.191 9597796.6223 24011.0009034 [-53.87058995 -0.13443377]
373874053.385 -13379504.31 -33480.2810842 [ 79.92445315 0.20036904]
726544597.0 18651274.1534 46663.6193386 [-106.58828839 -0.26626715]
1411884707.51 -26000217.8559 -65058.4461128 [ 153.41389017 0.38431731]
2743697288.89 36244780.0586 90684.1600127 [-209.03391041 -0.52252429]
5331791469.79 -50525887.4157 -126423.886221 [ 296.22496374 0.74171457]
10361201450.4 70434012.7562 176228.707876 [-408.11516382 -1.02057251]
20134788880.2 -98186304.1721 -245674.553107 [ 573.7478779 1.43617302]
39127675046.8 136873506.894 342466.322375 [-794.98719104 -1.9884902 ]
76036305324.8 -190804176.229 -477412.833248 [ 1113.05457125 2.78563813]
147760369643.0 265984517.38 665513.730619 [-1546.79060255 -3.86949918]
然而,当alpha设置为E-6(而不是E-5)时,前10行是
14495.6359775 -13788.3126768 -211.542964687 [ 0.01378831 0.00021154]
14306.0982004 -13697.7438847 -210.177498646 [ 0.02748606 0.00042172]
14119.0422005 -13607.7699931 -208.821001646 [ 0.04109383 0.00063054]
13934.4354818 -13518.3870942 -207.473414775 [ 0.05461221 0.00083801]
13752.2459738 -13429.5913063 -206.134679506 [ 0.0680418 0.00104415]
13572.4420258 -13341.3787729 -204.804737697 [ 0.08138318 0.00124895]
13394.9924018 -13253.7456628 -203.483531589 [ 0.09463693 0.00145244]
13219.8662747 -13166.6881702 -202.171003801 [ 0.10780362 0.00165461]
13047.0332208 -13080.202514 -200.867097331 [ 0.12088382 0.00185548]
12876.4632151 -12994.2849383 -199.571755548 [ 0.13387811 0.00205505]
12708.1266257 -12908.9317115 -198.284922195 [ 0.14678704 0.00225333]
......它继续收敛。顺便说一下,即使在N = 600时,1000次迭代也不足以实现适当的收敛;你可能想要使用epsilon数而不是迭代数。