在dplyr中将汇总函数转换为非标准评估NSE

时间:2016-03-23 16:04:21

标签: r dplyr lazy-evaluation

考虑以下生成摘要表的交互式示例:

CONSTANT

我想将此转换为函数并抽象掉library(dplyr) tg <- ToothGrowth ci_int <- 0.95 tg %>% group_by(supp, dose) %>% summarise(N = n(), mean = mean(len, na.rm = T), sd = sd(len, na.rm = T), se = sd / sqrt(N), ci = se * qt(ci_int / 2 + 0.50, N - 1)) # supp dose N mean sd se ci # (fctr) (dbl) (int) (dbl) (dbl) (dbl) (dbl) # 1 OJ 0.5 10 13.23 4.459709 1.4102837 3.190283 # 2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727 # 3 OJ 2.0 10 26.06 2.655058 0.8396031 1.899314 # 4 VC 0.5 10 7.98 2.746634 0.8685620 1.964824 # 5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343 # 6 VC 2.0 10 26.14 4.797731 1.5171757 3.432090 data.frame变量,measure分组变量和groupvars。这是一个开始:

conf.int

哪个收益率:

library(lazyeval)

summarySE <- function(df, measure, groupvars, conf.int = 0.95) {
  summary_dots <- list(
    ~ n(), 
    interp(~ mean(var, na.rm = T), var = as.name(measure)),
    interp(~ sd(var, na.rm = T), var = as.name(measure))
  )

  df %>%
    group_by_(.dots = groupvars) %>%
    summarise_(.dots = setNames(summary_dots, c("N", "mean", "sd")))
}

summarySE(tg, "len", c("supp", "dose"))

然而,这感觉不是很干?另外,我不确定如何实现# supp dose N mean sd # (fctr) (dbl) (int) (dbl) (dbl) # 1 OJ 0.5 10 13.23 4.459709 # 2 OJ 1.0 10 22.70 3.910953 # 3 OJ 2.0 10 26.06 2.655058 # 4 VC 0.5 10 7.98 2.746634 # 5 VC 1.0 10 16.77 2.515309 # 6 VC 2.0 10 26.14 4.797731 se而不会过于复杂/冗长?也许完全有一个更好的方法,或者这应该分成几个函数?

如何将上面的摘要表转换为函数,以便我可以将ci与不同data.framemeasure的任意组合与“精神”一起传递给它是groupvars

2 个答案:

答案 0 :(得分:4)

我真的不明白为什么SE和CI的计算比你已经做的更复杂。

我使用...参数来捕获您的分组参数,因为这似乎更容易使用。

总的来说,我最终得到以下功能:

summarySE <- function(.data, measure, ..., conf.int = 0.95) {
  dots <- lazyeval::lazy_dots(...)
  measure <- lazyeval::lazy(measure)

  summary_dots <- list(
    N = ~ n(),
    mean = lazyeval::interp(~ mean(var, na.rm = T), var = measure),
    sd = lazyeval::interp(~ sd(var, na.rm = T), var = measure),
    se = ~ sd / sqrt(N),
    ci = ~ se * qt(conf.int / 2 + 0.50, N - 1))

  .data <- dplyr::group_by_(.data, .dots = dots)
  dplyr::summarise_(.data, .dots = summary_dots)
}

如果您愿意,可以将其变成SE和NSE版本(和哈德利一样)。

用法:

summarySE(tg, len, supp, dose)

Source: local data frame [6 x 7]
Groups: supp [?]

    supp  dose     N  mean       sd        se       ci
  (fctr) (dbl) (int) (dbl)    (dbl)     (dbl)    (dbl)
1     OJ   0.5    10 13.23 4.459709 1.4102837 3.190283
2     OJ   1.0    10 22.70 3.910953 1.2367520 2.797727
3     OJ   2.0    10 26.06 2.655058 0.8396031 1.899314
4     VC   0.5    10  7.98 2.746634 0.8685620 1.964824
5     VC   1.0    10 16.77 2.515309 0.7954104 1.799343
6     VC   2.0    10 26.14 4.797731 1.5171757 3.432090

答案 1 :(得分:1)

我不确定这更多的是&#34;精神&#34; dplyr但你也可以尝试使用字符串来计算meansd等:

summarySE <- function(df, measure, groupvars, conf.int = 0.95) {
  df %>% group_by_(.dots = groupvars)%>%
    summarise_(N="n()",
               mean = paste0("mean(",measure,", na.rm = T)"),
               sd = paste0("sd(",measure,", na.rm = T)"),
               se = "sd/sqrt(N)",
               ci = paste0("se * stats::qt(",conf.int," / 2 + 0.50, N - 1)"))
}

summarySE(tg, "len", c("supp", "dose"))

#    supp  dose     N  mean       sd        se       ci
#  (fctr) (dbl) (int) (dbl)    (dbl)     (dbl)    (dbl)
#1     OJ   0.5    10 13.23 4.459709 1.4102837 3.190283
#2     OJ   1.0    10 22.70 3.910953 1.2367520 2.797727
#3     OJ   2.0    10 26.06 2.655058 0.8396031 1.899314
#4     VC   0.5    10  7.98 2.746634 0.8685620 1.964824
#5     VC   1.0    10 16.77 2.515309 0.7954104 1.799343
#6     VC   2.0    10 26.14 4.797731 1.5171757 3.432090