我想将数据框分组到某个列上,然后将函数应用于返回多个列的分组数据。举例来说,请考虑以下内容
Names = append(rep('Mark',10),rep('Joe',10))
Spend = rnorm(length(Names),50,0.5)
df <- data.frame(
Names,
Spend
)
get.mm <- function(data){
return(list(median(data),mean(data)))
}
此处,get.mm
返回两个数字的列表。我想将get.mm
应用于df %>% group_by(Names)
,并且结果有两列,每个函数输出一列。
期望的结果应该是
Names median mean
<fctr> <dbl> <dbl>
1 Joe 49.89284 49.9504
2 Mark 50.17244 50.0735
我已经简化了这里的演示功能,我知道我可以做一些像
这样的事情df %>% group_by(Names) %>% summarise(median = median(Spend), mean = mean(Spend))
答案 0 :(得分:1)
如果您重写get.mm
以便它返回数据框,那么您可以使用group_by %>% do
:
get.mm <- function(data){
data.frame(median = median(data), mean = mean(data))
}
df %>% group_by(Names) %>% do(get.mm(.$Spend))
# here . stands for a sub data frame with a unique Name, .$Spend passes the corresponding
# column to the function
可重现的例子:
set.seed(1)
Names = append(rep('Mark',10),rep('Joe',10))
Spend = rnorm(length(Names),50,0.5)
df <- data.frame(Names, Spend)
df %>% group_by(Names) %>% do(get.mm(.$Spend))
# A tibble: 2 x 3
# Groups: Names [2]
# Names median mean
# <fctr> <dbl> <dbl>
#1 Joe 50.24594 50.12442
#2 Mark 50.12829 50.06610
df %>% group_by(Names) %>% summarise(median = median(Spend), mean = mean(Spend))
# A tibble: 2 x 3
# Names median mean
# <fctr> <dbl> <dbl>
#1 Joe 50.24594 50.12442
#2 Mark 50.12829 50.06610