带有多个变量的summarise_each()中的多个函数

时间:2015-02-19 03:43:31

标签: r dplyr tidyr

以下作品。我相信有更好的解决方案。

library(dplyr)
library(tidyr)

iris %>%
  group_by(Species) %>%
  summarise_each(funs(mean, median)) %>%
  gather(var, val, -Species) %>%
  separate(var, c("variable", "summary"), sep = "_") %>%
  spread(summary, val)

1 个答案:

答案 0 :(得分:3)

gather您的相关变量首先,然后进行汇总计算。

例如:

iris %>% 
  gather(var, val, -Species) %>% 
  group_by(Species, var) %>% 
  summarise_each(funs(mean, median))

不仅代码更简洁,而且更快,因为你做的事情更少:

fun1 <- function() {
  iris %>%
    group_by(Species) %>%
    summarise_each(funs(mean, median)) %>%
    gather(var, val, -Species) %>%
    separate(var, c("variable", "summary"), sep = "_") %>%
    spread(summary, val)
}

fun2 <- function() {
  iris %>% 
    gather(var, val, -Species) %>% 
    group_by(Species, var) %>% 
    summarise_each(funs(mean, median))
}

library(microbenchmark)
library(compare)

microbenchmark(fun1(), fun2())
# Unit: milliseconds
#    expr      min       lq     mean   median       uq       max neval
#  fun1() 6.725408 6.950540 7.572307 7.202001 7.648250 12.326271   100
#  fun2() 3.346863 3.475828 3.784302 3.535849 3.824349  6.580824   100

compare(as.data.frame(fun1()), as.data.frame(fun2()), allowAll = TRUE)
# TRUE
#   [variable] coerced from <factor> to <character>
#   sorted
#   renamed
#   renamed rows
#   dropped names
#   dropped row names