统计模型给出了奇怪的结果。 R平方值是错误的

时间:2017-10-30 15:42:25

标签: python regression statsmodels

我使用statsmodels包在大数据集上运行简单的OLS回归(2m +观察)。 我的回归命令是:

import statsmodels.formula.api as sm   
import pandas as pd 
patent = pd.read_csv('input.csv')  

#Input.csv contains the columns mentioned in the regression equation
result = sm.ols(formula="generality~C(id)+C(year)+C(appyear)+familiarity + je",data = patent).fit()

当我尝试打印上述回归的摘要时,我得到了这个:

 /usr/local/lib/python3.6/site-packages/statsmodels/base/model.py:1036: RuntimeWarning: invalid value encountered in true_divide
   return self.params / self.bse
 /usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
   return (self.a < x) & (x < self.b)
 /usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
   return (self.a < x) & (x < self.b)
 /usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
   cond2 = cond0 & (x <= self.a)
                             OLS Regression Results                            
 ==============================================================================
 Dep. Variable:             generality   R-squared:                  -23041.069
 Model:                            OLS   Adj. R-squared:             -23041.886
 Method:                 Least Squares   F-statistic:                -2.820e+04
 Date:                Mon, 30 Oct 2017   Prob (F-statistic):               1.00
 Time:                        20:39:45   Log-Likelihood:            -8.9192e+06
 No. Observations:             1720455   AIC:                         1.784e+07
 Df Residuals:                 1720393   BIC:                         1.784e+07
 Df Model:                          61                                         
 Covariance Type:            nonrobust               

显然,R平方值是错误的。我在回归命令中做错了吗?

注意:数据太大,我无法粘贴到这里。年份和年份分别约有40个类别。

0 个答案:

没有答案