这是此处提出的问题的扩展:Aggregate / summarize multiple variables per group (e.g. sum, mean)。
aggregate
,是否有办法更改FUN
每个变量的汇总方式? dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)
> dat
ID Plot Val1 Val2
1 a 1 10 20
2 a 2 20 40
3 a 3 30 60
4 b 1 40 80
5 b 2 50 100
6 b 3 60 120
7 c 1 70 140
8 c 2 80 160
9 c 3 90 180
#Aggregate 2 variables using the *SAME* FUN
aggregate(cbind(Val1, Val2) ~ ID, dat, sum)
ID Val1 Val2
1 a 60 120
2 b 150 300
3 c 240 480
如果我想取Val1的 sum 和Val2的 mean 怎么办?
我拥有的最佳解决方案是:
merge(
aggregate(Val1 ~ ID, dat, sum),
aggregate(Val2 ~ ID, dat, mean),
by = c('ID')
)
我可以在Aggregate
???
aggregate
代码中没有看到任何可能会起作用的内容,但我之前错了......)
mtcars
)
Reduce(function(df1, df2) merge(df1, df2, by = c('cyl','am'), all = T),
list(
aggregate(hp ~ cyl + am, mtcars, sum, na.rm = T),
aggregate(wt ~ cyl + am, mtcars, min),
aggregate(qsec ~ cyl + am, mtcars, mean, na.rm = T),
aggregate(mpg ~ cyl + am, mtcars, mean, na.rm = T)
)
)
#I'd want a straightforward alternative like:
aggregate(cbind(hp,wt,qsec,mpg) ~ cyl + am, mtcars, list(sum, min, mean, mean), na.rm = T)
# ^(I know this doesn't work)
注意:我更喜欢基础R方法,但我已经意识到dplyr
或其他一些软件包可能会这样做&#34;更好&#34;
功能
答案 0 :(得分:3)
考虑列和函数的成对映射,然后运行Map
以构建聚合数据帧列表,因为aggregate
允许函数名称的字符串值。然后运行Reduce
将所有数据框元素合并在一起。
cols <- names(dat)[grep("Val", names(dat))]
fcts <- c("mean", "sum")
df_list <- Map(function(c, f) aggregate(.~ID, dat[c("ID", c)], FUN=f), cols, fcts)
final_df <- Reduce(function(x,y) merge(x, y, by="ID"), df_list)
final_df
# ID Val1 Val2
# 1 a 20 120
# 2 b 50 300
# 3 c 80 480
确保列和函数向量长度相同,可能需要重复功能。
并演示 mtcars :
cols <- c("hp", "wt", "qsec", "mpg")
fcts <- c("sum", "min", "mean", "mean")
df_list <- Map(function(c, f) aggregate(.~cyl+am, mtcars[c("cyl", "am", c)], FUN=f), cols, fcts)
Reduce(function(x,y) merge(x,y, by=c("cyl", "am")), df_list)
# cyl am hp wt qsec mpg
# 1 4 0 254 2.465 20.97000 22.90000
# 2 4 1 655 1.513 18.45000 28.07500
# 3 6 0 461 3.215 19.21500 19.12500
# 4 6 1 395 2.620 16.32667 20.56667
# 5 8 0 2330 3.435 17.14250 15.05000
# 6 8 1 599 3.170 14.55000 15.40000
答案 1 :(得分:2)
您可以使用summarise
包
dplyr
library(dplyr)
dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)
dat
#> ID Plot Val1 Val2
#> 1 a 1 10 20
#> 2 a 2 20 40
#> 3 a 3 30 60
#> 4 b 1 40 80
#> 5 b 2 50 100
#> 6 b 3 60 120
#> 7 c 1 70 140
#> 8 c 2 80 160
#> 9 c 3 90 180
dat %>%
group_by(ID) %>%
summarise(sum_val1 = sum(Val1, na.rm = TRUE),
mean_val2 = mean(Val2, na.rm = TRUE)) %>%
ungroup()
#> # A tibble: 3 x 3
#> ID sum_val1 mean_val2
#> <fct> <dbl> <dbl>
#> 1 a 60 40
#> 2 b 150 100
#> 3 c 240 160
由reprex package(v0.2.0)创建于2018-04-30。