按百分比和原始数字分组

时间:2018-08-21 18:23:57

标签: stata

我有一个像这样的数据集:

six columns: area (categorical), amount ($), population (categorical), purpose (categorical), purpose2(categorical), county(categorical)

我想创建一个按区域分组的表,并显示该区域的总量(占总量和原始数量的百分比),以及每个区域记录/观测值总数的百分比记录/观察的总数作为原始数。

下面的代码用于生成原始数字表,但不显示总数的百分比:

tabstat amount, by(county) stat(sum count) 

2 个答案:

答案 0 :(得分:1)

没有固定的命令可以执行您想要的操作。您将必须自己对表进行编程。

这是一个使用auto.dta的简单示例:

. sysuse auto, clear
(1978 Automobile Data)

. tabstat price, by(foreign) stat(sum count)

Summary for variables: price
     by categories of: foreign (Car type)

 foreign |       sum         N
---------+--------------------
Domestic |    315766        52
 Foreign |    140463        22
---------+--------------------
   Total |    456229        74
------------------------------

您可以进行计算,并将原始数字保存在变量中,如下所示:

. generate total_obs = _N

. display total_obs
74

. count if foreign == 0
  52

. generate total_domestic_obs = r(N)

. count if foreign == 1
  22

. generate total_foreign_obs = r(N)

. egen total_domestic_price = total(price) if foreign == 0 

. sort total_domestic_price
. local tdp = total_domestic_price

. display total_domestic_price
315766

. egen total_foreign_price = total(price)  if foreign == 1

. sort total_foreign_price
. local tfp = total_foreign_price

. display total_foreign_price
140463

. generate total_price = `tdp' + `tfp' 

. display total_price
456229

对于百分比:

. generate pct_domestic_price = (`tdp' / total_price) * 100

. display pct_domestic_price
69.212173

. generate pct_foreign_price = (`tfp' / total_price) * 100 

. display pct_foreign_price 
30.787828

编辑:

这是一种执行上述操作的自动化方法,而无需指定单个值:

program define foo

syntax varlist(min=1 max=1), by(string)

generate total_obs = _N
display total_obs

quietly levelsof `by', local(nlevels)

foreach x of local nlevels {
    count if `by' == `x'
    quietly generate total_`by'`x'_obs = r(N)

    quietly egen total_`by'`x'_`varlist' = total(`varlist') if `by' == `x' 
    sort total_`by'`x'_`varlist'
    local tvar`x' = total_`by'`x'_`varlist'
    local tvarall `tvarall' `tvar`x'' +
    display total_`by'`x'_`varlist'
}

quietly generate total_`varlist' = `tvarall' 0 
display total_`varlist'

foreach x of local nlevels {
    quietly generate pct_`by'`x'_`varlist' = (`tvar`x'' / total_`varlist') * 100
    display pct_`by'`x'_`varlist'
}

end

结果相同:

. foo price, by(foreign)
74
  52
315766
  22
140463
456229
69.212173
30.787828

显然,您将需要按照自己的喜好格式化结果表。

答案 1 :(得分:1)

这是另一种方法。我偷了@Pearly Spencer的例子。可以将其概括为命令。我要传达的主要信息是list对于制表和其他报告很有用,通常只是有义务预先计算要显示的内容。

. sysuse auto, clear
(1978 Automobile Data)

. preserve 

. collapse (sum) total=price (count) obs=price, by(foreign)

. egen pc2 = pc(total)

. egen pc1 = pc(obs)

. char pc2[varname]  "%"

. char pc1[varname]  "%"

. format pc* %2.1f 

. list foreign obs pc1 total pc2 , subvarname noobs sum(obs pc1 total pc2) 

      +-----------------------------------------+
      |  foreign   obs       %    total       % |
      |-----------------------------------------|
      | Domestic    52    70.3   315766    69.2 |
      |  Foreign    22    29.7   140463    30.8 |
      |-----------------------------------------|
  Sum |             74   100.0   456229   100.0 |
      +-----------------------------------------+


. restore 

EDIT这是egen中的一篇文章,具有类似的风格,但保留了原始数据,并且新变量也可用于导出或图形。

. sysuse auto, clear
(1978 Automobile Data)

. egen total = sum(price), by(foreign) 

. egen obs = count(price), by(total) 

. egen tag = tag(foreign) 

. egen pc2 = pc(total) if tag
(72 missing values generated)

. egen pc1 = pc(obs) if tag 
(72 missing values generated)

. char pc2[varname]  "%"

. char pc1[varname]  "%"

. format pc* %2.1f 

. list foreign obs pc1 total pc2 if tag, subvarname noobs sum(obs pc1 total pc2) 

      +-----------------------------------------+
      |  foreign   obs       %    total       % |
      |-----------------------------------------|
      | Domestic    52    70.3   315766    69.2 |
      |  Foreign    22    29.7   140463    30.8 |
      |-----------------------------------------|
  Sum |             74   100.0   456229   100.0 |
      +-----------------------------------------+