请考虑以下玩具数据集:
clear
input group str10 name n
1 "Jenny" 1
1 "Jenny" 1
1 "Ben" 1
1 "Tiffany" 1
1 "Sun" 1
2 "Jenny" 1
2 "Sun" 1
2 "Tiffany" 1
2 "S" 1
2 "T" 1
2 "R" 1
2 "Y" 1
2 "U" 1
2 "I" 1
2 "E" 1
2 "A" 1
2 "B" 1
3 "U" 1
3 "I" 1
3 "E" 1
3 "A" 1
3 "B" 1
end
我的代码如下:
gen n=1
graph hbar (count) n, over(name, sort(1)) over(group)
如果我使用上述数据,这会显示所有混杂的名称:
如何创建条形图,该条形图仅显示在频率上排名前10位的类别,分别在group
的每个不同值中确定?
答案 0 :(得分:3)
这是一个稍作修改的示例:
clear
input group str50 name n
1 "Jenny" 1
1 "Jenny" 1
1 "Ben" 1
1 "Tiffany" 1
1 "Jenny" 1
1 "Sun" 1
2 "Jenny" 1
2 "Sun" 1
2 "Sun" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "S" 1
2 "T" 1
2 "R" 1
2 "Y" 1
2 "U" 1
2 "I" 1
2 "E" 1
2 "A" 1
2 "B" 1
3 "U" 1
3 "Ramon" 1
3 "Ramon" 1
3 "Ramon" 1
3 "Ramon" 1
3 "I" 1
3 "I" 1
3 "I" 1
3 "E" 1
3 "A" 1
3 "B" 1
end
您可以首先collapse
您的数据集:
collapse (count) n, by(group name)
然后您可以通过如下调整频率阈值来控制绘制的名称的数量:
gsort group -n
bysort group: generate tag = _n < 3
graph hbar (asis) n if tag, over(name) over(group) nofill
答案 1 :(得分:2)
为了说明选择10个最频繁的类的方法,我们在这里构造一个包含2组的数据集。每个课程有11个课程。然后,我们展示了一种选择10个最频繁的常规方法。
* create sandbox dataset
clear
set obs 22
tokenize "`c(ALPHA)'"
generate name = ""
generate freq = _n
generate group = cond(_n <= 11, 1, 2)
forval j = 1/11 {
replace name = "``j''" if inlist(_n, `j', 23 - `j')
}
tabulate name group [fw=freq]
expand freq
drop freq
这是数据集的样子(上面tabulate
命令的结果):
| group
name | 1 2 | Total
-----------+----------------------+----------
A | 1 22 | 23
B | 2 21 | 23
C | 3 20 | 23
D | 4 19 | 23
E | 5 18 | 23
F | 6 17 | 23
G | 7 16 | 23
H | 8 15 | 23
I | 9 14 | 23
J | 10 13 | 23
K | 11 12 | 23
-----------+----------------------+----------
Total | 66 187 | 253
最常见的十个类别是第1组的K,J,...,C,B和第2组的A,...,J。
这里是获取和绘制10个最频繁的方法(针对每个组分别确定)。用户代码从此处开始,如果需要,可以用用户的不同数字代替10。在这种情况下,没有什么取决于两个示例,如示例所示。
bysort group name : generate freq = _N
egen tag = tag(group name)
gsort group -tag -freq name
by group: generate selected = _n <= 10
bysort group name (selected) : replace selected = selected[_N]
graph hbar (count) if selected, over(name, sort(1) descending) by(group) nofill scheme(s1color)