Question

梳子1 是 pandas数据框，其中包含以下值。

yearID teamID薪水W
408 ANA 51464167 82
409 ARI 81027833 85

当我使用 np.linalg.lstsq 时，我可以打印dfg数据框。

dfg = pd.DataFrame()

comb1 = combined[combined['yearID'] == 2000]
x1 = comb1['salary'].values /1000000 
y1 =comb1['W'].values
A1 = np.array([x1, np.ones(len(x1))])
w1 = np.linalg.lstsq(A1.T,y1)[0]
yq = (w1[0]*x1+w1[1])
dfg['New val'] = y1 - yq

当我使用scikit学习libary进行线性回归并执行相同的操作时，我得到一个值错误

from sklearn.linear_model import LinearRegression
fg = pd.DataFrame()

x2 = comb1['salary'].values /1000000 
y2 =comb1['W'].values

x2_reshape = x2.reshape(-1,1)
y2_reshape = y2.reshape(-1,1)

clf1 = LinearRegression()
clf1.fit(x2_reshape, y2_reshape)
predicted_train = clf1.predict(x2_reshape)

 x_pre = y2 - predicted_train 
fg['New val'] = x_pre

这两者有什么区别？请帮助我!!

Answer 1

它们应该是same：

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.

如果您收到错误，可能是因为您设置数据的方式。

np.linalg.lstsq与scikit中的线性回归之间的区别

1 个答案: