Question

我有一个模板，用于从源中汇总数据以获取均值和95％的置信度，以便将这些数据绘制在ggplot中（最初改编自多年前的Stack Overflow，很抱歉，不知道原始来源），就像这样：

data %>%
  group_by(var1, var2) %>%
  summarise(count=n(),
            mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
            sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
            n.outcome_variable = n(),
            total.outcome_variable = sum(outcome_variable)) %>%
  mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
         lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
         upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)

这对于一个或两个结果变量效果很好，但是复制和粘贴大量结果变量变得不切实际，所以我希望使用summarise_if代替，因为我有大量都是数字的结果变量。但是，我不知道如何在“ funs”参数中指定比简单函数（例如“ mean”或“ sd”）更复杂的内容。我已经尝试了gmodels :: ci（）如下：

dataset_aggregated <- data %>%
  group_by(var1, var2) %>%
  summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either

但这会导致

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".

如何使它工作？

Answer 1

我已经准备好发布问题时，我就想出了解决办法，但是我想分享一下，以防其他人遇到同样的问题，因为答案很简单，而且我不敢相信我想了这么久。基本上，我只是制作了自定义的lci（）和uci（）函数，以将结果与gmodels :: ci（）分开，并改为调用它们，例如

lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}

dataset_aggregated <- dataset %>%
  group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
  summarise_if(is.numeric, funs(mean, lci, uci)) %>% 
  select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically

如何将更复杂的函数传递给summarise_if或mutate_if？

1 个答案: