我使用statsmodels包在大数据集上运行简单的OLS回归(2m +观察)。 我的回归命令是:
import statsmodels.formula.api as sm
import pandas as pd
patent = pd.read_csv('input.csv')
#Input.csv contains the columns mentioned in the regression equation
result = sm.ols(formula="generality~C(id)+C(year)+C(appyear)+familiarity + je",data = patent).fit()
当我尝试打印上述回归的摘要时,我得到了这个:
/usr/local/lib/python3.6/site-packages/statsmodels/base/model.py:1036: RuntimeWarning: invalid value encountered in true_divide
return self.params / self.bse
/usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
/usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
/usr/local/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
OLS Regression Results
==============================================================================
Dep. Variable: generality R-squared: -23041.069
Model: OLS Adj. R-squared: -23041.886
Method: Least Squares F-statistic: -2.820e+04
Date: Mon, 30 Oct 2017 Prob (F-statistic): 1.00
Time: 20:39:45 Log-Likelihood: -8.9192e+06
No. Observations: 1720455 AIC: 1.784e+07
Df Residuals: 1720393 BIC: 1.784e+07
Df Model: 61
Covariance Type: nonrobust
显然,R平方值是错误的。我在回归命令中做错了吗?
注意:数据太大,我无法粘贴到这里。年份和年份分别约有40个类别。