tldr:为什么sklearn LinearRegression会给出与梯度下降不同的结果?
我的理解是LinearRegression正在计算线性回归的封闭形式解决方案(在这里很好地描述https://stats.stackexchange.com/questions/278755/why-use-gradient-descent-for-linear-regression-when-a-closed-form-math-solution)。如果数据集很大,则LinearRegression不好,在这种情况下需要使用随机梯度下降。我有一个小数据集,并希望使用Batch Gradient Descent(自编)作为我自己的启发的中间步骤。
我使用LinearRegression和Batch Gradient Descent获得不同的回归权重。该解决方案不应该是唯一的吗?
代码:
import numpy as np
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
data=pd.read_csv(r'') #Data set attached
X=data[['Size','Floor','Broadband Rate']]
y=data['Rental Price']
#Sklearn Linear Regression
ols=linear_model.LinearRegression(fit_intercept=True, normalize=False)
LR=ols.fit(X,y)
Res_LR=y.values-LR.predict(X) #Residuals
print('Intercept', LR.intercept_, 'Weights', LR.coef_)
#Batch Gradient Descent
def error_delta(x,y,p,wn):
total=0
row,column=np.shape(x)
for i in range(0,row):
if wn!=0:total+=(y[i]-(p[0]+np.dot(p[1:len(p)],x[i,:])))*x[i,wn-1]
else: total+=(y[i]-(p[0]+np.dot(p[1:len(p)],x[i,:])))*1
return total
def weight_update(x,y,p,alpha):
old=p
new=np.zeros(len(p))
for i in range(0,len(p)): new[i]=old[i]+alpha*error_delta(x,y,old,i)
return new
weight=[-.146,.185,-.044,.119] #random starting conditions
alpha=.00000002 #learning rate
for i in range(0,500): #Seems to have converged by 100
weight=weight_update(X.values,y.values,weight,alpha)
Res_BGD=np.zeros(len(X.values))
for i in range(0,len(X.values)): Res_BGD[i]=y.values[i]-(weight[0]+np.dot(weight[1::],X.values[i,:]))
print('Inercept', weight[0], 'Weights', weight[1:len(weight)])
plt.plot(np.arange(0,len(X.values)),Res_LR,color='b')
plt.plot(np.arange(0,len(X.values)), Res_BGD,color='g')
plt.legend(['Res LR', 'Res BGD'])
plt.show()
数据集低于(10分)
Size,Floor,Broadband Rate,Energy Rating,Rental Price
" 500 "," 4 "," 8 "," C "," 320 "
" 550 "," 7 "," 50 "," A "," 380 "
" 620 "," 9 "," 7 "," A "," 400 "
" 630 "," 5 "," 24 "," B "," 390 "
" 665 "," 8 "," 100 "," C "," 385 "
" 700 "," 4 "," 8 "," B "," 410 "
" 770 "," 10 "," 7 "," B "," 480 "
" 880 "," 12 "," 50 "," A "," 600 "
" 920 "," 14 "," 8 "," C "," 570 "
" 1000 "," 9 "," 24 "," B "," 620 "
当您绘制残差时,尽管权重非常不同,但性能仍具有可比性
SklearnLR Intercept 19.5615588974 重量[0.54873985 4.96354677 -0.06209515]
BGD Inercept -0.145402077197 重量[0.62549182 -0.0344091 0.11473203]
思考?此外,如果有任何编程反馈,我也可以采用更有效的方式来编写代码。