我不知道为什么sklearn的线性回归的最后一个系数不正确。我正在使用波士顿的住房数据。
A是样本(506,14) b是房屋数据(价格)
from sklearn.linear_model import LinearRegression
A_sort = A[sort_ind,:]
reg = LinearRegression().fit(A_sort, b)
w = reg.coef_
fig = plt.figure()
ax1 = fig.add_subplot(121)
plt.plot(A_sort@w, '-o', Color='r', LineWidth=1.5, MarkerSize=6, label='Regression Native')
plt.xlabel('sklearn')
#using numpy
w2 = np.linalg.lstsq(A_sort, b)
ax2 = fig.add_subplot(122)
plt.plot(A_sort@w2[0], '-o', Color='r', LineWidth=1.5, MarkerSize=6, label='Regression Native')
plt.xlabel('numpy')
sklearn cofficients w:
array([-1.08011358e-01, 4.64204584e-02, 2.05586264e-02, 2.68673382e+00,
-1.77666112e+01, 3.80986521e+00, 6.92224640e-04, -1.47556685e+00,
3.06049479e-01, -1.23345939e-02, -9.52747232e-01, 9.31168327e-03,
-5.24758378e-01, 0.00000000e+00])
Numpy coefficients w2 (this is the correct answer from SVD):
array([-1.08011358e-01, 4.64204584e-02, 2.05586264e-02, 2.68673382e+00,
-1.77666112e+01, 3.80986521e+00, 6.92224640e-04, -1.47556685e+00,
3.06049479e-01, -1.23345939e-02, -9.52747232e-01, 9.31168327e-03,
-5.24758378e-01, 3.64594884e+01])
此示例摘自史蒂夫·布伦顿的演讲。他还提供了使用SVD的代码,我想尝试一些库函数。
谢谢!