Question

我在Stata中进行了回归：

reg y I.ind1990#I.year, nocons r

然后我使用

从Stata导出系数向量

matrix x = e(b)
esttab matrix(x) using "xx.csv", replace plain

并使用

将其加载到Python和pandas中

df = pd.read_csv('xx.csv', skiprows=1, index_col=[0]).T.dropna()
df.index.name = 'interaction'
df = df.reset_index()

ind1990和year是数字。但是我的csv中有一些奇怪的值（年份和ind被手动拉出interaction）：

            interaction        y1 ind   year
0  0b.ind1990#2001b.year  0.000000  0b  2001b
1   0b.ind1990#2002.year  0.320578  0b   2002
2   0b.ind1990#2003.year  0.304471  0b   2003
3   0b.ind1990#2004.year  0.271429  0b   2004
4   0b.ind1990#2005.year  0.295347  0b   2005

我相信0b是Stata如何翻译缺失的值，即NIU。但我无法理解其他非数字值。

这是我多年来得到的（并且b和o都是意外的后缀：

array(['2001b', '2002', '2003', '2004', '2005', '2006', '2007', '2008',
       '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2004o',
       '2008o', '2012o', '2003o', '2005o', '2006o', '2007o', '2009o',
       '2010o', '2011o', '2013o', '2014o', '2015o', '2002o'], dtype=object)

和ind1990（其中0b显然是NIU，但也有o个后缀我无法理解：

array(['0b', '10', '11', '12', '20', '31', '32', '40', '41', '42', '50',
       '60', '100', '101', '102', '110', '111', '112', '120', '121', '122',
       '122o', '130', '130o', '132', '140', '141', '142', '150', '151',
       '152', '152o', '160', '161', '162', '171', '172', '180', '181',
       '182', '190', '191', '192', '200', '201', '201o', '210', '211',
       '220', '220o', '221', '221o', '222', '222o', '230', '231', '232',
       '241', '242', '250', '251', '252', '261', '262', '270', '271',
       '272o', '272'], dtype=object)

b和o后缀在交互列的值末尾的含义是什么？

Answer 1

这不是一个答案，但它不会成为一个评论，它可能会澄清这个问题。

如果没有@FooBar的数据，这里的例子是不可复制的。这是另一个（a）Stata用户可以复制的内容，（b）我认为Python用户可以导入：

. sysuse auto, clear 
(1978 Automobile Data)

. regress mpg i.foreign#i.rep78, nocons r 
note: 1.foreign#1b.rep78 identifies no observations in the sample
note: 1.foreign#2.rep78 identifies no observations in the sample

Linear regression                               Number of obs     =         69
                                                F(7, 62)          =     364.28
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9291
                                                Root MSE          =     6.1992

-------------------------------------------------------------------------------
              |               Robust
          mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
foreign#rep78 |
  Domestic#2  |     19.125   1.311239    14.59   0.000     16.50387    21.74613
  Domestic#3  |         19   .8139726    23.34   0.000     17.37289    20.62711
  Domestic#4  |   18.44444   1.520295    12.13   0.000     15.40542    21.48347
  Domestic#5  |         32   1.491914    21.45   0.000     29.01771    34.98229
   Foreign#1  |          0  (empty)
   Foreign#2  |          0  (empty)
   Foreign#3  |   23.33333   1.251522    18.64   0.000     20.83158    25.83509
   Foreign#4  |   24.88889   .8995035    27.67   0.000     23.09081    26.68697
   Foreign#5  |   26.33333   3.105666     8.48   0.000      20.1252    32.54147
-------------------------------------------------------------------------------

. matrix b = e(b) 

. esttab matrix(b) using b.csv, plain 
(output written to b.csv)

b.csv文件如下所示：

"","b","","","","","","","","",""
"","0b.foreign#1b.rep78","0b.foreign#2.rep78","0b.foreign#3.rep78","0b.foreign#4.rep78","0b.foreign#5.rep78","1o.foreign#1b.rep78","1o.foreign#2o.rep78","1.foreign#3.rep78","1.foreign#4.rep78","1.foreign#5.rep78"
"y1","0","19.125","19","18.44444","32","0","0","23.33333","24.88889","26.33333"

非Stata用户可以访问Stata的符号。见enter link description here

我不使用esttab（用户编写的Stata程序）或Python（这是无知，不是偏见），所以除此之外我不能发表评论。

导出Stata的系数向量：交互列中后缀的含义

1 个答案: