dataset = pd.read_excel('dfmodel.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
from sklearn.metrics import r2_score
print('The R2 score of Multi-Linear Regression model is: ',r2_score(y_test,y_pred))
使用上面的代码,我设法进行了线性回归并获得了R2。 如何获取每个预测变量的beta系数?
答案 0 :(得分:0)
在sklearn.linear_model.LinearRegression文档页面上,您可以分别在E:\Android\.android
和C:\Users\<username>\.android
处找到系数(斜率)和截距。
如果在拟合模型之前使用sklearn.preprocessing.StandardScaler,则回归系数应为您要查找的Beta系数。
答案 1 :(得分:0)
使用regressor.coef_
。通过与statsmodels
实现进行比较,您可以看到这些系数如何按预测变量进行映射:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(fit_intercept=False)
regressor.fit(X, y)
regressor.coef_
# array([0.43160901, 0.42441214])
statsmodels
版本:
import statsmodels.api as sm
sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.624
Model: OLS Adj. R-squared (uncentered): 0.623
Method: Least Squares F-statistic: 414.0
Date: Tue, 29 Sep 2020 Prob (F-statistic): 1.25e-106
Time: 17:03:27 Log-Likelihood: -192.54
No. Observations: 500 AIC: 389.1
Df Residuals: 498 BIC: 397.5
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.4316 0.041 10.484 0.000 0.351 0.512
x2 0.4244 0.041 10.407 0.000 0.344 0.505
==============================================================================
Omnibus: 36.830 Durbin-Watson: 1.967
Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.197
Skew: 0.059 Prob(JB): 0.00136
Kurtosis: 2.213 Cond. No. 2.57
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
您可以使用以下方法进行直接等效性测试:
np.array([regressor.coef_.round(8) == res.params.round(8)]).all() # True
答案 2 :(得分:0)
就我个人而言,我更喜欢指定 1 度的 np.polyfit() 单步。
import numpy as np
np.polyfit(X,y,1)[0] #returns beta + other coeffs if > 1 degree.
所以您的问题,如果我理解,您希望根据初始 y 计算预测 y 值 - 将是这样的:
np.polyfit(y_test,y_pred,1)[0]
不过我会测试 np.polyfit(x_test,y_pred)[0] 。