如何将更复杂的函数传递给summarise_if或mutate_if?

时间:2019-12-11 14:05:07

标签: r dplyr conditional-statements

我有一个模板,用于从源中汇总数据以获取均值和95%的置信度,以便将这些数据绘制在ggplot中(最初改编自多年前的Stack Overflow,很抱歉,不知道原始来源),就像这样:

data %>%
  group_by(var1, var2) %>%
  summarise(count=n(),
            mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
            sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
            n.outcome_variable = n(),
            total.outcome_variable = sum(outcome_variable)) %>%
  mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
         lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
         upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)

这对于一个或两个结果变量效果很好,但是复制和粘贴大量结果变量变得不切实际,所以我希望使用summarise_if代替,因为我有大量都是数字的结果变量。但是,我不知道如何在“ funs”参数中指定比简单函数(例如“ mean”或“ sd”)更复杂的内容。我已经尝试了gmodels :: ci()如下:

dataset_aggregated <- data %>%
  group_by(var1, var2) %>%
  summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either

但这会导致

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".

如何使它工作?

1 个答案:

答案 0 :(得分:0)

我已经准备好发布问题时,我就想出了解决办法,但是我想分享一下,以防其他人遇到同样的问题,因为答案很简单,而且我不敢相信我想了这么久。基本上,我只是制作了自定义的lci()和uci()函数,以将结果与gmodels :: ci()分开,并改为调用它们,例如

lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}

dataset_aggregated <- dataset %>%
  group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
  summarise_if(is.numeric, funs(mean, lci, uci)) %>% 
  select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically