聚合并找到最大值

时间:2017-05-06 10:25:40

标签: r

我有以下数据框:

df1 <- data.frame(city =c("c1","c2","c3","c2","c1","c2"),people =c(1000,234,678,45,11,100))

我尝试查找c1,c2,c3中的总人数,并选择最大的人口。我在下面写了代码:

aggregate(city~people, df1, FUN = function(x) length(unique(x)))

如何完成此代码以实现我的目标。

注意:我的结果就像

c1: 1011 
c2: 379 

是最大的城市。)

1 个答案:

答案 0 :(得分:1)

如果您不介意输出格式与数据框架相对的数据略有差异,那么

tapply比聚合更有效。

microbenchmark::microbenchmark(tapply(df1$people, df1$city, sum), aggregate(people~city, df1, sum))
Unit: microseconds
                               expr     min       lq      mean   median       uq      max neval
  tapply(df1$people, df1$city, sum)  48.283  60.2675   86.4515  68.0010  107.416  258.671   100
 aggregate(people ~ city, df1, sum) 690.907 715.2445 1012.9741 770.7325 1268.336 3853.902   100

此代码为您提供分别包含最大和最小总数的城市名称

sum_by_city <- tapply(df1$people, df1$city, sum)
names(which.max(sum_by_city))
names(which.min(sum_by_city))

或者如果你想要前2名

names(sort(sum_by_city, decreasing = TRUE)[1:2])