来自这样的数据集:
import pandas as pd
import numpy as np
import statsmodels.api as sm
# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
df = df.set_index(rng)
......和这样的线性回归模型:
x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()
...您可以通过这种方式轻松检索一些模型系数:
print(model.params)
但我无法找到如何从模型摘要中检索所有其他参数:
print(str(model.summary()))
如问题中所述,我对 R-squared 特别感兴趣。
从帖子How to extract a particular value from the OLS-summary in Pandas?我了解到你可以使用print(model.r2)
在那里做同样的事情。但这似乎不适用于statsmodels。
有什么建议吗?
答案 0 :(得分:6)
你可以得到R平方,如:
model.rsquared
import pandas as pd
import numpy as np
import statsmodels.api as sm
# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
df = df.set_index(rng)
x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()
print(model.params)
print(model.rsquared)
print(str(model.summary()))
const 176.636417
x -0.357185
dtype: float64
0.338332793094
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.338
Model: OLS Adj. R-squared: 0.272
Method: Least Squares F-statistic: 5.113
Date: Tue, 30 Jan 2018 Prob (F-statistic): 0.0473
Time: 05:36:04 Log-Likelihood: -41.442
No. Observations: 12 AIC: 86.88
Df Residuals: 10 BIC: 87.85
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 176.6364 20.546 8.597 0.000 130.858 222.415
x -0.3572 0.158 -2.261 0.047 -0.709 -0.005
==============================================================================
Omnibus: 1.934 Durbin-Watson: 1.182
Prob(Omnibus): 0.380 Jarque-Bera (JB): 1.010
Skew: -0.331 Prob(JB): 0.603
Kurtosis: 1.742 Cond. No. 1.10e+03
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.1e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
只需一小段代码:
for attr in dir(model):
if not attr.startswith('_'):
print(attr)
您可以看到对象上的所有属性:
HC0_se
HC1_se
HC2_se
HC3_se
aic
bic
bse
centered_tss
compare_f_test
compare_lm_test
compare_lr_test
condition_number
conf_int
conf_int_el
cov_HC0
cov_HC1
cov_HC2
cov_HC3
cov_kwds
cov_params
cov_type
df_model
df_resid
eigenvals
el_test
ess
f_pvalue
f_test
fittedvalues
fvalue
get_influence
get_prediction
get_robustcov_results
initialize
k_constant
llf
load
model
mse_model
mse_resid
mse_total
nobs
normalized_cov_params
outlier_test
params
predict
pvalues
remove_data
resid
resid_pearson
rsquared
rsquared_adj
save
scale
ssr
summary
summary2
t_test
tvalues
uncentered_tss
use_t
wald_test
wald_test_terms
wresid
答案 1 :(得分:0)
您可以使用服饰,例如