为什么python中的随机效果与stata不匹配?

时间:2015-06-26 17:47:12

标签: python panel stata

我试图在熊猫中实现随机效果模型,但我回归的系数与我的Stata输出不匹配。我使用了一组航空公司航线和机票价格。这是我的Python代码:

import pandas as pd
import pandas.stats.plm as plm

airline = pd.read_csv("C:...\Airline.csv")
airline['constant'] = 1.0
airline = airline.set_index(['route', 'time'])
airlinePanel = airline.to_panel()


airlineRE = plm.PanelOLS(y = airlinePanel['lnMktfare'], x=airlinePanel[['constant', 'mktdistance', 'passengers', 'percentAA', 'percentAS',
            'percentDL', 'percentHA', 'percentNK', 'percentUA', 'percentUS', 'percentWN']],
            intercept= True, time_effects=True, dropped_dummies=True, verbose=True)
print airlineRE

并输出:

 -------------------------Summary of Regression Analysis-------------------------

Formula: Y ~  <mktdistance> + <passengers> + <percentAA>
         + <percentAS> + <percentDL> + <percentHA> + <percentNK> + <percentUA>
         + <percentUS> + <percentWN>

Number of Observations:         88000
Number of Degrees of Freedom:   1010

R-squared:         0.2357
Adj R-squared:     0.2268

Rmse:              0.3762

F-stat (10, 86990):    26.5805, p-value:     0.0000

Degrees of Freedom: model 1009, resid 86990

-----------------------Summary of Estimated Coefficients------------------------
Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
mktdistance   0.0002     0.0000     125.73     0.0000     0.0002     0.0002
passengers   -0.0000     0.0000     -33.44     0.0000    -0.0000    -0.0000
percentAA     0.1290     0.0045      28.85     0.0000     0.1202     0.1378
percentAS     0.1079     0.0067      16.06     0.0000     0.0947     0.1211
--------------------------------------------------------------------------------
percentDL     0.2682     0.0033      81.44     0.0000     0.2617     0.2746
percentHA    -0.1609     0.1439      -1.12     0.2635    -0.4430     0.1211
percentNK    -0.4412     0.0144     -30.73     0.0000    -0.4693    -0.4131
percentUA     0.2156     0.0041      52.70     0.0000     0.2076     0.2236
percentUS     0.1839     0.0034      54.19     0.0000     0.1772     0.1905
--------------------------------------------------------------------------------
percentWN    -0.0658     0.0033     -19.93     0.0000    -0.0722    -0.0593
---------------------------------End of Summary---------------------------------

首先,在我转到Stata输出之前,有人知道为什么即使我放intercept = True我也没有得到拦截术语?即使我手动将其添加到回归方程中,Python估计常量如下:

-----------------------Summary of Estimated Coefficients------------------------
Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
constant     0.0000        nan        nan        nan        nan        nan

其他估计都没有改变。现在为Stata代码:

import delimited "C:...\Airline.csv", clear
xtset route time
xtreg lnmktfare mktdistance passengers percent*

Stata输出:

Random-effects GLS regression                   Number of obs     =     88,000
Group variable: route                          Number of groups  =      1,000

R-sq:                                           Obs per group:
     within  = 0.2983                                         min =         88
     between = 0.6943                                         avg =       88.0
     overall = 0.3154                                         max =         88

                                                Wald chi2(97)     =   39530.19
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
 lnmktfare   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 mktdistance |   .0002374   1.78e-06   133.40   0.000     .0002339    .0002409
 passengers  |  -.0000382   8.90e-07   -42.91   0.000    -.0000399   -.0000364

 percentAA   |   .1340237   .0058275    23.00   0.000      .122602    .1454454
 percentAS   |   .1159311    .006403    18.11   0.000     .1033815    .1284807
 percentDL   |   .2689447   .0039186    68.63   0.000     .2612644     .276625
 percentHA   |  -.0637648   .1378896    -0.46   0.644    -.3340235    .2064939
 percentNK   |  -.4974099   .0131605   -37.80   0.000     -.523204   -.4716158
 percentUA   |   .1653212   .0055116    30.00   0.000     .1545187    .1761236
 percentUS   |   .1784333   .0046914    38.03   0.000     .1692383    .1876283
 percentWN   |  -.1531444   .0041407   -36.98   0.000    -.1612601   -.1450286
     _cons   |   4.893488    .011821   413.97   0.000     4.870319    4.916657
-------------+----------------------------------------------------------------
   sigma_u   |  .02593863
   sigma_e   |  .36056598
       rho   |  .00514853   (fraction of variance due to u_i)
------------------------------------------------------------------------------

我不知道为什么这两个程序之间的系数略有偏差,但它足以让我担心大熊猫的准确性。我的主要问题是(1)为什么我不能从熊猫那里得到拦截期限? (2)为什么系数不匹配两个包。注意,我在Python和Stata之间比较了OLS,Logit和IV2SLS模型,结果完全匹配,这让我觉得在熊猫中实现随机效应模型可能有问题。我在IPython 3.0.0和Stata 14中运行Python 2.7.9。

1 个答案:

答案 0 :(得分:1)

您的python代码正在执行固定效果。您可以从自由度中看到这一点,在python输出中超过1000,在Stata输出中低于100。与固定效应不同,随机效应不被视为要估计的参数 - 假设它们与X不相关,但具有特定的误差结构,使得RE比合并的OLS更有效。