频率表,包括观察名称

时间:2014-09-02 21:43:00

标签: r frequency

我想知道以下是否能够以更高效,更优雅的方式完成以下任务。方式:

我的数据类似于下面的数据,其中包含国家/地区名称(位置)和国家/地区已加入的协议数量(no.agreements):

d <- structure(list(location = c("Afghanistan", "Angola", "Bangladesh", 
                                     "Bosnia-Herzegovina", "Burundi", "C\x99te d'Ivoire", "Cambodia", 
                                     "Chad", "Colombia", "Comoros", "Congo", "Croatia", "Democratic Republic of Congo (Zaire)", 
                                     "Djibouti", "El Salvador", "Georgia", "Guatemala", "Guinea Bissau", 
                                     "Haiti", "India", "Indonesia", "Liberia", "Macedonia", "Mali", 
                                     "Mexico", "Moldova", "Mozambique", "Nepal", "Niger", "Papua-New Guinea", 
                                     "Philippines", "Rwanda", "Senegal", "Serbia (Yugoslavia)", "Sierra Leone", 
                                     "Somalia", "South Africa", "Sudan", "Tajikistan", "Uganda", "United Kingdom"
    ), no.agreements = c(3L, 5L, 1L, 2L, 3L, 4L, 1L, 10L, 1L, 1L, 
                         1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 1L, 1L, 
                         1L, 1L, 2L, 3L, 3L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 1L, 1L, 1L)), .Names = c("location", 
                                                                                                  "no.agreements"), row.names = c(1L, 4L, 9L, 10L, 12L, 15L, 19L, 
                                                                                                                                  20L, 30L, 31L, 32L, 33L, 34L, 36L, 38L, 39L, 40L, 41L, 42L, 43L, 
                                                                                                                                  45L, 47L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 58L, 61L, 64L, 65L, 
                                                                                                                                  66L, 68L, 71L, 73L, 74L, 77L, 78L, 79L), class = "data.frame")

我对有多少国家(变量&#34;位置&#34;)有1,2,3等协议的频率感兴趣。 ftable(d$no.agreements)产生了要求的结果:23个国家有1个协议,2个国家有8个协议......

  1  2  3  4  5 10

 23  8  7  1  1  1

我现在想知道是否有直接的方式来添加另一行(!),其中包括每个类别中包含的国家/地区的名称,例如有10个协议的1个国家是乍得,1个国家有5个协议是安哥拉等。附加行中的相应单元格将包括所有相关国家名称(作为字符串)。

当然,我可以通过以下方式识别国家/地区的名称。 d [d $ no.agreements == 10,c(&#34; location&#34;)],并对所有频率重复此操作并手动创建表格,例如在Excle中。但我想知道是否有一种更直接的方法可以将位置的名称作为列表(?)插入到附加行的单元格中。

这会让事情变得更有效率。非常感谢。

2 个答案:

答案 0 :(得分:1)

您可以使用aggregate()汇总表格。

aggregate(location~no.agreements,data=d,FUN="unique")

答案 1 :(得分:1)

这些不是真正的行,而是列名称和单个计数向量的向量。如果您想要每个命名列的名称列表,可以使用tapply

> tapply(d$location, d$no.agreements, c)
$`1`
 [1] "Bangladesh"     "Cambodia"       "Colombia"      
 [4] "Comoros"        "Congo"          "Croatia"       
 [7] "El Salvador"    "Georgia"        "Guatemala"     
[10] "Guinea Bissau"  "Haiti"          "Macedonia"     
[13] "Mali"           "Mexico"         "Moldova"       
[16] "Mozambique"     "Nepal"          "Rwanda"        
[19] "Senegal"        "South Africa"   "Tajikistan"    
[22] "Uganda"         "United Kingdom"

$`2`
[1] "Bosnia-Herzegovina"                  
[2] "Democratic Republic of Congo (Zaire)"
[3] "Djibouti"                            
[4] "India"                               
[5] "Indonesia"                           
[6] "Niger"                               
[7] "Serbia (Yugoslavia)"                 
[8] "Somalia"                             

$`3`
[1] "Afghanistan"      "Burundi"          "Liberia"         
[4] "Papua-New Guinea" "Philippines"      "Sierra Leone"    
[7] "Sudan"           

$`4`
[1] "C\x99te d'Ivoire"

$`5`
[1] "Angola"

$`10`
[1] "Chad"

有几种方法可以捆绑销售:

 as.data.frame( tapply(d$location, d$no.agreements, function(x) list(x, length(x)))    )
                                                                                                                                                                          tapply(d$location, d$no.agreements, function(x) list(x, length(x)))
1  Bangladesh, Cambodia, Colombia, Comoros, Congo, Croatia, El Salvador, Georgia, Guatemala, Guinea Bissau, Haiti, Macedonia, Mali, Mexico, Moldova, Mozambique, Nepal, Rwanda, Senegal, South Africa, Tajikistan, Uganda, United Kingdom, 23
2                                                                                                                Bosnia-Herzegovina, Democratic Republic of Congo (Zaire), Djibouti, India, Indonesia, Niger, Serbia (Yugoslavia), Somalia, 8
3                                                                                                                                                        Afghanistan, Burundi, Liberia, Papua-New Guinea, Philippines, Sierra Leone, Sudan, 7
4                                                                                                                                                                                                                         C\x99te d'Ivoire, 1
5                                                                                                                                                                                                                                   Angola, 1
10                                                                                                                                                                                                                                    Chad, 1
> do.call(rbind, tapply(d$location, d$no.agreements, function(x) list(x, length(x))))
   [,1]               [,2]
1  Character,23       23  
2  Character,8        8   
3  Character,7        7   
4  "C\x99te d'Ivoire" 1   
5  "Angola"           1   
10 "Chad"             1