我试图在熊猫中实现随机效果模型,但我回归的系数与我的Stata输出不匹配。我使用了一组航空公司航线和机票价格。这是我的Python代码:
import pandas as pd
import pandas.stats.plm as plm
airline = pd.read_csv("C:...\Airline.csv")
airline['constant'] = 1.0
airline = airline.set_index(['route', 'time'])
airlinePanel = airline.to_panel()
airlineRE = plm.PanelOLS(y = airlinePanel['lnMktfare'], x=airlinePanel[['constant', 'mktdistance', 'passengers', 'percentAA', 'percentAS',
'percentDL', 'percentHA', 'percentNK', 'percentUA', 'percentUS', 'percentWN']],
intercept= True, time_effects=True, dropped_dummies=True, verbose=True)
print airlineRE
并输出:
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <mktdistance> + <passengers> + <percentAA>
+ <percentAS> + <percentDL> + <percentHA> + <percentNK> + <percentUA>
+ <percentUS> + <percentWN>
Number of Observations: 88000
Number of Degrees of Freedom: 1010
R-squared: 0.2357
Adj R-squared: 0.2268
Rmse: 0.3762
F-stat (10, 86990): 26.5805, p-value: 0.0000
Degrees of Freedom: model 1009, resid 86990
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
mktdistance 0.0002 0.0000 125.73 0.0000 0.0002 0.0002
passengers -0.0000 0.0000 -33.44 0.0000 -0.0000 -0.0000
percentAA 0.1290 0.0045 28.85 0.0000 0.1202 0.1378
percentAS 0.1079 0.0067 16.06 0.0000 0.0947 0.1211
--------------------------------------------------------------------------------
percentDL 0.2682 0.0033 81.44 0.0000 0.2617 0.2746
percentHA -0.1609 0.1439 -1.12 0.2635 -0.4430 0.1211
percentNK -0.4412 0.0144 -30.73 0.0000 -0.4693 -0.4131
percentUA 0.2156 0.0041 52.70 0.0000 0.2076 0.2236
percentUS 0.1839 0.0034 54.19 0.0000 0.1772 0.1905
--------------------------------------------------------------------------------
percentWN -0.0658 0.0033 -19.93 0.0000 -0.0722 -0.0593
---------------------------------End of Summary---------------------------------
首先,在我转到Stata输出之前,有人知道为什么即使我放intercept = True
我也没有得到拦截术语?即使我手动将其添加到回归方程中,Python估计常量如下:
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
constant 0.0000 nan nan nan nan nan
其他估计都没有改变。现在为Stata代码:
import delimited "C:...\Airline.csv", clear
xtset route time
xtreg lnmktfare mktdistance passengers percent*
Stata输出:
Random-effects GLS regression Number of obs = 88,000
Group variable: route Number of groups = 1,000
R-sq: Obs per group:
within = 0.2983 min = 88
between = 0.6943 avg = 88.0
overall = 0.3154 max = 88
Wald chi2(97) = 39530.19
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lnmktfare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mktdistance | .0002374 1.78e-06 133.40 0.000 .0002339 .0002409
passengers | -.0000382 8.90e-07 -42.91 0.000 -.0000399 -.0000364
percentAA | .1340237 .0058275 23.00 0.000 .122602 .1454454
percentAS | .1159311 .006403 18.11 0.000 .1033815 .1284807
percentDL | .2689447 .0039186 68.63 0.000 .2612644 .276625
percentHA | -.0637648 .1378896 -0.46 0.644 -.3340235 .2064939
percentNK | -.4974099 .0131605 -37.80 0.000 -.523204 -.4716158
percentUA | .1653212 .0055116 30.00 0.000 .1545187 .1761236
percentUS | .1784333 .0046914 38.03 0.000 .1692383 .1876283
percentWN | -.1531444 .0041407 -36.98 0.000 -.1612601 -.1450286
_cons | 4.893488 .011821 413.97 0.000 4.870319 4.916657
-------------+----------------------------------------------------------------
sigma_u | .02593863
sigma_e | .36056598
rho | .00514853 (fraction of variance due to u_i)
------------------------------------------------------------------------------
我不知道为什么这两个程序之间的系数略有偏差,但它足以让我担心大熊猫的准确性。我的主要问题是(1)为什么我不能从熊猫那里得到拦截期限? (2)为什么系数不匹配两个包。注意,我在Python和Stata之间比较了OLS,Logit和IV2SLS模型,结果完全匹配,这让我觉得在熊猫中实现随机效应模型可能有问题。我在IPython 3.0.0和Stata 14中运行Python 2.7.9。
答案 0 :(得分:1)
您的python代码正在执行固定效果。您可以从自由度中看到这一点,在python输出中超过1000,在Stata输出中低于100。与固定效应不同,随机效应不被视为要估计的参数 - 假设它们与X不相关,但具有特定的误差结构,使得RE比合并的OLS更有效。