梳子1 是 pandas数据框,其中包含以下值。
yearID teamID薪水W
408 ANA 51464167 82
409 ARI 81027833 85
当我使用 np.linalg.lstsq 时,我可以打印dfg数据框。
dfg = pd.DataFrame()
comb1 = combined[combined['yearID'] == 2000]
x1 = comb1['salary'].values /1000000
y1 =comb1['W'].values
A1 = np.array([x1, np.ones(len(x1))])
w1 = np.linalg.lstsq(A1.T,y1)[0]
yq = (w1[0]*x1+w1[1])
dfg['New val'] = y1 - yq
当我使用scikit学习libary进行线性回归并执行相同的操作时,我得到一个值错误
from sklearn.linear_model import LinearRegression
fg = pd.DataFrame()
x2 = comb1['salary'].values /1000000
y2 =comb1['W'].values
x2_reshape = x2.reshape(-1,1)
y2_reshape = y2.reshape(-1,1)
clf1 = LinearRegression()
clf1.fit(x2_reshape, y2_reshape)
predicted_train = clf1.predict(x2_reshape)
x_pre = y2 - predicted_train
fg['New val'] = x_pre
这两者有什么区别?请帮助我!!
答案 0 :(得分:0)
它们应该是same:
Notes
From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.
如果您收到错误,可能是因为您设置数据的方式。