我使用python处理线性回归模型,json数据如下:
{"Y":[1,2,3,4,5],"X":[[1,43,23],[2,3,43],[3,23,334],[4,43,23],[232,234,24]]}
我使用statsmodels.api.sm.OLS()。fit和statsmodels.formula.api.ols.fit(),我认为它们是相同的模型,但是结果不同。
这是第一个功能:
import statsmodels.api as sm
def analyze1():
print 'using sm.OLS().fit'
data = json.load(open(FNAME_DATA))
X = np.asarray(data['X'])
Y = np.log(np.asarray(data['Y']) + 1)
X2 = sm.add_constant(X)
results = sm.OLS(Y, X2).fit()
print results.summary()
这是第二个功能:
from statsmodels.formula.api import ols
def analyze2():
print 'using ols().fit'
data = json.load(open(FNAME_DATA))
results=ols('Y~X+1',data=data).fit()
print results.summary()
第一个函数输出:
using sm.OLS().fit
/home/aaron/anaconda2/lib/python2.7/site-packages/statsmodels/stats/stattools.py:72: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
"samples were given." % int(n), ValueWarning)
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.449
Model: OLS Adj. R-squared: -1.204
Method: Least Squares F-statistic: 0.2717
Date: Wed, 07 Aug 2019 Prob (F-statistic): 0.849
Time: 07:17:00 Log-Likelihood: -0.87006
No. Observations: 5 AIC: 9.740
Df Residuals: 1 BIC: 8.178
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0859 0.720 1.509 0.373 -8.057 10.228
x1 0.0024 0.018 0.134 0.915 -0.229 0.234
x2 0.0005 0.020 0.027 0.983 -0.256 0.257
x3 0.0008 0.003 0.332 0.796 -0.031 0.033
==============================================================================
Omnibus: nan Durbin-Watson: 1.485
Prob(Omnibus): nan Jarque-Bera (JB): 0.077
Skew: 0.175 Prob(JB): 0.962
Kurtosis: 2.503 Cond. No. 402.
==============================================================================
第二个函数输出:
using ols().fit
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.551
Model: OLS Adj. R-squared: -0.796
Method: Least Squares F-statistic: 0.4092
Date: Wed, 07 Aug 2019 Prob (F-statistic): 0.784
Time: 07:17:00 Log-Likelihood: -6.8251
No. Observations: 5 AIC: 21.65
Df Residuals: 1 BIC: 20.09
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.9591 2.368 0.827 0.560 -28.124 32.042
X[0] 0.0030 0.060 0.051 0.968 -0.757 0.764
X[1] 0.0098 0.066 0.148 0.906 -0.834 0.854
X[2] 0.0024 0.008 0.289 0.821 -0.103 0.108
==============================================================================
Omnibus: nan Durbin-Watson: 1.485
Prob(Omnibus): nan Jarque-Bera (JB): 0.077
Skew: 0.175 Prob(JB): 0.962
Kurtosis: 2.503 Cond. No. 402.
==============================================================================
我认为它们是相似的模型,但是使用相同的数据,结果(coef)和对数似然是不同的,我不知道这两个模型是否有差异。
答案 0 :(得分:0)
前者(OLS
)是一门课程。后者(ols
)是OLS
类的方法,它是从statsmodels.base.model.Model
继承的。
In [11]: from statsmodels.api import OLS
In [12]: from statsmodels.formula.api import ols
In [13]: OLS
Out[13]: statsmodels.regression.linear_model.OLS
In [14]: ols
Out[14]: <bound method Model.from_formula of <class 'statsmodels.regression.linear_model.OLS'>>
根据我自己的测试,我相信模型应该产生相同的结果。但是,在您的示例中,您是在第一个模型中将日志应用于y,而不是在第二个模型中。相同的字段仅由X计算,两个模型中的相同。字段不同是因为y的差异。
由于我无权访问您的数据,请随时使用此独立示例进行健全性检查。在我安装它们之后,这两个模型(似乎是垃圾)产生了相同的摘要。
示例:
import pandas as pd
import statsmodels.api as sm
import numpy as np
from sklearn.datasets import load_diabetes
from statsmodels.formula.api import ols
X = pd.DataFrame(data=load_diabetes()['data'],
columns=load_diabetes()['feature_names'])
X.drop(['age', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], axis=1, inplace=True)
X = sm.add_constant(X)
y = pd.DataFrame(data=load_diabetes()['target'], columns=['y'])
mod1 = sm.OLS(np.log(y), X)
results1 = mod1.fit()
print(results1.summary())
mod2 = ols('np.log(y) ~ sex + bmi + const', data=pd.concat([X, y], axis=1))
results2 = mod2.fit()
print(results2.summary())
输出(OLS):
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.297
Model: OLS Adj. R-squared: 0.294
Method: Least Squares F-statistic: 92.90
Date: Tue, 06 Aug 2019 Prob (F-statistic): 2.27e-34
Time: 21:06:21 Log-Likelihood: -291.29
No. Observations: 442 AIC: 588.6
Df Residuals: 439 BIC: 600.9
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.8813 0.022 218.671 0.000 4.837 4.925
sex -0.0868 0.471 -0.184 0.854 -1.013 0.839
bmi 6.4042 0.471 13.593 0.000 5.478 7.330
==============================================================================
Omnibus: 14.733 Durbin-Watson: 1.892
Prob(Omnibus): 0.001 Jarque-Bera (JB): 15.547
Skew: -0.446 Prob(JB): 0.000421
Kurtosis: 2.776 Cond. No. 22.0
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
输出(ols):
OLS Regression Results
==============================================================================
Dep. Variable: np.log(y) R-squared: 0.297
Model: OLS Adj. R-squared: 0.294
Method: Least Squares F-statistic: 92.90
Date: Tue, 06 Aug 2019 Prob (F-statistic): 2.27e-34
Time: 21:06:22 Log-Likelihood: -291.29
No. Observations: 442 AIC: 588.6
Df Residuals: 439 BIC: 600.9
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.4407 0.011 218.671 0.000 2.419 2.463
sex -0.0868 0.471 -0.184 0.854 -1.013 0.839
bmi 6.4042 0.471 13.593 0.000 5.478 7.330
const 2.4407 0.011 218.671 0.000 2.419 2.463
==============================================================================
Omnibus: 14.733 Durbin-Watson: 1.892
Prob(Omnibus): 0.001 Jarque-Bera (JB): 15.547
Skew: -0.446 Prob(JB): 0.000421
Kurtosis: 2.776 Cond. No. 7.63e+15
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.52e-29. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.