我有一个像这样的数据集:
我想创建一个按区域分组的表,并显示该区域的总量(占总量和原始数量的百分比),以及每个区域记录/观测值总数的百分比记录/观察的总数作为原始数。
下面的代码用于生成原始数字表,但不显示总数的百分比:
tabstat amount, by(county) stat(sum count)
答案 0 :(得分:1)
没有固定的命令可以执行您想要的操作。您将必须自己对表进行编程。
这是一个使用auto.dta
的简单示例:
. sysuse auto, clear
(1978 Automobile Data)
. tabstat price, by(foreign) stat(sum count)
Summary for variables: price
by categories of: foreign (Car type)
foreign | sum N
---------+--------------------
Domestic | 315766 52
Foreign | 140463 22
---------+--------------------
Total | 456229 74
------------------------------
您可以进行计算,并将原始数字保存在变量中,如下所示:
. generate total_obs = _N
. display total_obs
74
. count if foreign == 0
52
. generate total_domestic_obs = r(N)
. count if foreign == 1
22
. generate total_foreign_obs = r(N)
. egen total_domestic_price = total(price) if foreign == 0
. sort total_domestic_price
. local tdp = total_domestic_price
. display total_domestic_price
315766
. egen total_foreign_price = total(price) if foreign == 1
. sort total_foreign_price
. local tfp = total_foreign_price
. display total_foreign_price
140463
. generate total_price = `tdp' + `tfp'
. display total_price
456229
对于百分比:
. generate pct_domestic_price = (`tdp' / total_price) * 100
. display pct_domestic_price
69.212173
. generate pct_foreign_price = (`tfp' / total_price) * 100
. display pct_foreign_price
30.787828
编辑:
这是一种执行上述操作的自动化方法,而无需指定单个值:
program define foo
syntax varlist(min=1 max=1), by(string)
generate total_obs = _N
display total_obs
quietly levelsof `by', local(nlevels)
foreach x of local nlevels {
count if `by' == `x'
quietly generate total_`by'`x'_obs = r(N)
quietly egen total_`by'`x'_`varlist' = total(`varlist') if `by' == `x'
sort total_`by'`x'_`varlist'
local tvar`x' = total_`by'`x'_`varlist'
local tvarall `tvarall' `tvar`x'' +
display total_`by'`x'_`varlist'
}
quietly generate total_`varlist' = `tvarall' 0
display total_`varlist'
foreach x of local nlevels {
quietly generate pct_`by'`x'_`varlist' = (`tvar`x'' / total_`varlist') * 100
display pct_`by'`x'_`varlist'
}
end
结果相同:
. foo price, by(foreign)
74
52
315766
22
140463
456229
69.212173
30.787828
显然,您将需要按照自己的喜好格式化结果表。
答案 1 :(得分:1)
这是另一种方法。我偷了@Pearly Spencer的例子。可以将其概括为命令。我要传达的主要信息是list
对于制表和其他报告很有用,通常只是有义务预先计算要显示的内容。
. sysuse auto, clear
(1978 Automobile Data)
. preserve
. collapse (sum) total=price (count) obs=price, by(foreign)
. egen pc2 = pc(total)
. egen pc1 = pc(obs)
. char pc2[varname] "%"
. char pc1[varname] "%"
. format pc* %2.1f
. list foreign obs pc1 total pc2 , subvarname noobs sum(obs pc1 total pc2)
+-----------------------------------------+
| foreign obs % total % |
|-----------------------------------------|
| Domestic 52 70.3 315766 69.2 |
| Foreign 22 29.7 140463 30.8 |
|-----------------------------------------|
Sum | 74 100.0 456229 100.0 |
+-----------------------------------------+
. restore
EDIT这是egen
中的一篇文章,具有类似的风格,但保留了原始数据,并且新变量也可用于导出或图形。
. sysuse auto, clear
(1978 Automobile Data)
. egen total = sum(price), by(foreign)
. egen obs = count(price), by(total)
. egen tag = tag(foreign)
. egen pc2 = pc(total) if tag
(72 missing values generated)
. egen pc1 = pc(obs) if tag
(72 missing values generated)
. char pc2[varname] "%"
. char pc1[varname] "%"
. format pc* %2.1f
. list foreign obs pc1 total pc2 if tag, subvarname noobs sum(obs pc1 total pc2)
+-----------------------------------------+
| foreign obs % total % |
|-----------------------------------------|
| Domestic 52 70.3 315766 69.2 |
| Foreign 22 29.7 140463 30.8 |
|-----------------------------------------|
Sum | 74 100.0 456229 100.0 |
+-----------------------------------------+