如何根据分类变量生成统计表?

时间:2017-02-01 16:20:21

标签: statistics output stata

我有很多变量需要描述性统计(手段)。但是,作为列,我想使用分类变量(AlcCons1)的值。

我使用以下代码来执行此操作:

tabstat Age25_29 Age30_34 ... SmokeY religAtndY, statistics( mean ) by(AlcCons1) 我得到了这样的结果:

AlcCons1 | Age25_29 Age30_34 Age35_39 Age40_44 Age45_49 Age50_54 Age55_59 ---------+---------------------------------------------------------------------- 1 | .0987326 .0936242 .1243994 .1668614 .1579665 .1481626 .1258278 2 | .1037879 .11853 .1451863 .1415631 .1317288 .1231884 .1387164 3 | .0905679 .1151016 .1405161 .1624963 .1506231 .137278 .123246 4 | .0649853 .0716117 .1094201 .1606857 .1786286 .1630888 .1401794 ---------+---------------------------------------------------------------------- Total | .091001 .0986022 .1286311 .1617972 .156643 .144962 .1289952 ------------------------------

如何交换列和行? (转置表格)

2 个答案:

答案 0 :(得分:1)

原则上,答案是c(statistics)。对于这种示例,这是合法的并且它产生一种转置,但结果不是精确的转置。这是一种做得更好的方法。

问题中没有可重现的例子,所以我们需要找到一个。

使用手段是偶然的。任何其他统计数据都会出现同样的问题。

这是我们可能想要转置的那种表格。

. sysuse census, clear
(1980 Census data by state)

. tabstat poplt5-pop65p , s(p50) by(region)

Summary statistics: p50
  by categories of: region (Census region)

 region |    poplt5   pop5_17    pop18p    pop65p
--------+----------------------------------------
     NE |    185188    637731   2284657    364864
N Cntrl |  327094.5    936449   3126055  521880.5
  South |  289571.5    880546   2803536  407053.5
   West |    114731    303176    884987    109220
--------+----------------------------------------
  Total |  227467.5    629654   2175130    370495
-------------------------------------------------

技巧1:通过获取仅包含我们想要制表的数据集的数据集来简化问题。

. collapse (p50) poplt5-pop65p, by(region)

. l

     +---------------------------------------------------------+
     | region       poplt5   pop5_17        pop18p      pop65p |
     |---------------------------------------------------------|
  1. | NE          185,188   637,731     2,284,657     364,864 |
  2. | N Cntrl   327,094.5   936,449   3,126,054.5   521,880.5 |
  3. | South     289,571.5   880,546     2,803,536   407,053.5 |
  4. | West        114,731   303,176       884,987     109,220 |
     +---------------------------------------------------------+

技巧2:使用reshape将不同类别的不同变量映射到单个分类变量。

. reshape long pop, i(region) j(age) string
(note: j = 18p 5_17 65p lt5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        4   ->      16
Number of variables                   5   ->       3
j variable (4 values)                     ->   age
xij variables:
              pop18p pop5_17 ... poplt5   ->   pop
-----------------------------------------------------------------------------

. l, sepby(region)

     +------------------------------+
     | region     age           pop |
     |------------------------------|
  1. | NE         18p     2,284,657 |
  2. | NE        5_17       637,731 |
  3. | NE         65p       364,864 |
  4. | NE         lt5       185,188 |
     |------------------------------|
  5. | N Cntrl    18p   3,126,054.5 |
  6. | N Cntrl   5_17       936,449 |
  7. | N Cntrl    65p     521,880.5 |
  8. | N Cntrl    lt5     327,094.5 |
     |------------------------------|
  9. | South      18p     2,803,536 |
 10. | South     5_17       880,546 |
 11. | South      65p     407,053.5 |
 12. | South      lt5     289,571.5 |
     |------------------------------|
 13. | West       18p       884,987 |
 14. | West      5_17       303,176 |
 15. | West       65p       109,220 |
 16. | West       lt5       114,731 |
     +------------------------------+

技巧3:直接使用tabdisp

. tabdisp age region, c(pop)

--------------------------------------------------------------
          |                   Census region                   
      age |          NE      N Cntrl        South         West
----------+---------------------------------------------------
      18p |   2,284,657  3,126,054.5    2,803,536      884,987
     5_17 |     637,731      936,449      880,546      303,176
      65p |     364,864    521,880.5    407,053.5      109,220
      lt5 |     185,188    327,094.5    289,571.5      114,731
--------------------------------------------------------------

技巧4:可能需要进行一些清理工作。

. label def age 1 lt5 2 5_17 3 18p 4 65p

. encode age , gen(ageclass) label(age)

. tab ageclass

   ageclass |      Freq.     Percent        Cum.
------------+-----------------------------------
        lt5 |          4       25.00       25.00
       5_17 |          4       25.00       50.00
        18p |          4       25.00       75.00
        65p |          4       25.00      100.00
------------+-----------------------------------
      Total |         16      100.00

. label def age 1 "<5" 2 "5-17" 3 "18-64" 4 "65+", modify

. tabdisp ageclass region, c(pop)

--------------------------------------------------------------
          |                   Census region                   
 ageclass |          NE      N Cntrl        South         West
----------+---------------------------------------------------
       <5 |     185,188    327,094.5    289,571.5      114,731
     5-17 |     637,731      936,449      880,546      303,176
    18-64 |   2,284,657  3,126,054.5    2,803,536      884,987
      65+ |     364,864    521,880.5    407,053.5      109,220
--------------------------------------------------------------

答案 1 :(得分:0)

我在以下链接中找到了答案:https://www.stata.com/statalist/archive/2005-09/msg00561.html 我试图转置表,所以我安装了命令:

ssc install tabstatmat, replace

tabstat Age25_29 Age30_34 CurntSmokeY religAtndY, by(AlcCons1) stat(mean) col(stat) long format(%9.2f) save

qui tabstatmat B

matrix B = B'

matrix list B, f(%9.2f)

我得到了我需要的东西:

B[41,5]
      1: 2: 3: 4: Total:
      mean mean mean mean mean

Age25_29 0.10 0.10 0.09 0.06 0.09

Age30_34 0.09 0.12 0.12 0.07 0.10

Age35_39 0.12 0.15 0.14 0.11 0.13

Age40_44 0.17 0.14 0.16 0.16 0.16

现在的问题是如何让它看起来更好(删除“mean”,用单词更改1,2,3,4)然后使用putexcel命令?