Question

我不知道为什么sklearn的线性回归的最后一个系数不正确。我正在使用波士顿的住房数据。

A是样本（506,14） b是房屋数据（价格）

from sklearn.linear_model import LinearRegression
A_sort = A[sort_ind,:]
reg = LinearRegression().fit(A_sort, b)
w = reg.coef_

fig = plt.figure()
ax1 = fig.add_subplot(121)
plt.plot(A_sort@w, '-o', Color='r', LineWidth=1.5, MarkerSize=6, label='Regression Native')
plt.xlabel('sklearn')

#using numpy
w2 = np.linalg.lstsq(A_sort, b)
ax2 = fig.add_subplot(122)
plt.plot(A_sort@w2[0], '-o', Color='r', LineWidth=1.5, MarkerSize=6, label='Regression Native')
plt.xlabel('numpy')

sklearn cofficients w: 
array([-1.08011358e-01,  4.64204584e-02,  2.05586264e-02,  2.68673382e+00,
       -1.77666112e+01,  3.80986521e+00,  6.92224640e-04, -1.47556685e+00,
        3.06049479e-01, -1.23345939e-02, -9.52747232e-01,  9.31168327e-03,
       -5.24758378e-01,  0.00000000e+00])

Numpy coefficients w2 (this is the correct answer from SVD):
array([-1.08011358e-01,  4.64204584e-02,  2.05586264e-02,  2.68673382e+00,
       -1.77666112e+01,  3.80986521e+00,  6.92224640e-04, -1.47556685e+00,
        3.06049479e-01, -1.23345939e-02, -9.52747232e-01,  9.31168327e-03,
       -5.24758378e-01,  3.64594884e+01])

此示例摘自史蒂夫·布伦顿的演讲。他还提供了使用SVD的代码，我想尝试一些库函数。

谢谢！

sklearn线性回归的最后系数不同于numpy

0 个答案: