我有以下数据框:
df1 <- data.frame(city =c("c1","c2","c3","c2","c1","c2"),people =c(1000,234,678,45,11,100))
我尝试查找c1,c2,c3中的总人数,并选择最大的人口。我在下面写了代码:
aggregate(city~people, df1, FUN = function(x) length(unique(x)))
如何完成此代码以实现我的目标。
(注意:我的结果就像
c1: 1011
c2: 379
是最大的城市。)
答案 0 :(得分:1)
tapply比聚合更有效。
microbenchmark::microbenchmark(tapply(df1$people, df1$city, sum), aggregate(people~city, df1, sum))
Unit: microseconds
expr min lq mean median uq max neval
tapply(df1$people, df1$city, sum) 48.283 60.2675 86.4515 68.0010 107.416 258.671 100
aggregate(people ~ city, df1, sum) 690.907 715.2445 1012.9741 770.7325 1268.336 3853.902 100
此代码为您提供分别包含最大和最小总数的城市名称
sum_by_city <- tapply(df1$people, df1$city, sum)
names(which.max(sum_by_city))
names(which.min(sum_by_city))
或者如果你想要前2名
names(sort(sum_by_city, decreasing = TRUE)[1:2])