在条形图中仅以频率来绘制名字

时间:2019-03-11 19:16:05

标签: graph stata

请考虑以下玩具数据集:

clear

input group str10 name n
1     "Jenny"   1
1     "Jenny"   1
1     "Ben"     1
1     "Tiffany" 1
1     "Sun"     1
2     "Jenny"   1
2     "Sun"     1
2     "Tiffany" 1
2     "S"       1
2     "T"       1
2     "R"       1
2     "Y"       1
2     "U"       1
2     "I"       1
2     "E"       1
2     "A"       1
2     "B"       1
3     "U"       1
3     "I"       1
3     "E"       1
3     "A"       1
3     "B"       1
end

我的代码如下:

gen n=1
graph hbar (count) n, over(name, sort(1)) over(group)

如果我使用上述数据,这会显示所有混杂的名称:

enter image description here

如何创建条形图,该条形图仅显示在频率上排名前10位的类别,分别在group的每个不同值中确定?

2 个答案:

答案 0 :(得分:3)

这是一个稍作修改的示例:

clear
input group str50 name n
1     "Jenny"   1
1     "Jenny"   1
1     "Ben"     1
1     "Tiffany" 1
1     "Jenny"   1
1     "Sun"     1
2     "Jenny"   1
2     "Sun"     1
2     "Sun"     1
2     "Tiffany" 1
2     "Tiffany" 1
2     "Tiffany" 1
2     "Tiffany" 1
2     "Tiffany" 1
2     "S"       1
2     "T"       1
2     "R"       1
2     "Y"       1
2     "U"       1
2     "I"       1
2     "E"       1
2     "A"       1
2     "B"       1
3     "U"       1
3     "Ramon"   1
3     "Ramon"   1
3     "Ramon"   1
3     "Ramon"   1
3     "I"       1
3     "I"       1
3     "I"       1
3     "E"       1
3     "A"       1
3     "B"       1
end

您可以首先collapse您的数据集:

collapse (count) n, by(group name)

然后您可以通过如下调整频率阈值来控制绘制的名称的数量:

gsort group -n
bysort group: generate tag = _n < 3

graph hbar (asis) n if tag, over(name) over(group) nofill

enter image description here

答案 1 :(得分:2)

为了说明选择10个最频繁的类的方法,我们在这里构造一个包含2组的数据集。每个课程有11个课程。然后,我们展示了一种选择10个最频繁的常规方法。

* create sandbox dataset 
clear 
set obs 22 
tokenize "`c(ALPHA)'" 
generate name = "" 
generate freq = _n 
generate group = cond(_n <= 11, 1, 2) 
forval j = 1/11 { 
      replace name = "``j''" if inlist(_n, `j', 23 - `j') 
}  

tabulate name group [fw=freq] 
expand freq 
drop freq 

这是数据集的样子(上面tabulate命令的结果):

           |         group
      name |         1          2 |     Total
-----------+----------------------+----------
         A |         1         22 |        23 
         B |         2         21 |        23 
         C |         3         20 |        23 
         D |         4         19 |        23 
         E |         5         18 |        23 
         F |         6         17 |        23 
         G |         7         16 |        23 
         H |         8         15 |        23 
         I |         9         14 |        23 
         J |        10         13 |        23 
         K |        11         12 |        23 
-----------+----------------------+----------
     Total |        66        187 |       253 

最常见的十个类别是第1组的K,J,...,C,B和第2组的A,...,J。

这里是获取和绘制10个最频繁的方法(针对每个组分别确定)。用户代码从此处开始,如果需要,可以用用户的不同数字代替10。在这种情况下,没有什么取决于两个示例,如示例所示。

bysort group name : generate freq = _N
egen tag = tag(group name)
gsort group -tag -freq name
by group: generate selected = _n <= 10
bysort group name (selected) : replace selected = selected[_N]

graph hbar (count) if selected, over(name, sort(1) descending) by(group) nofill scheme(s1color)

enter image description here