我有一个模板,用于从源中汇总数据以获取均值和95%的置信度,以便将这些数据绘制在ggplot中(最初改编自多年前的Stack Overflow,很抱歉,不知道原始来源),就像这样:
data %>%
group_by(var1, var2) %>%
summarise(count=n(),
mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
n.outcome_variable = n(),
total.outcome_variable = sum(outcome_variable)) %>%
mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)
这对于一个或两个结果变量效果很好,但是复制和粘贴大量结果变量变得不切实际,所以我希望使用summarise_if代替,因为我有大量都是数字的结果变量。但是,我不知道如何在“ funs”参数中指定比简单函数(例如“ mean”或“ sd”)更复杂的内容。我已经尝试了gmodels :: ci()如下:
dataset_aggregated <- data %>%
group_by(var1, var2) %>%
summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either
但这会导致
Error in summarise_impl(.data, dots) :
Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".
如何使它工作?
答案 0 :(得分:0)
我已经准备好发布问题时,我就想出了解决办法,但是我想分享一下,以防其他人遇到同样的问题,因为答案很简单,而且我不敢相信我想了这么久。基本上,我只是制作了自定义的lci()和uci()函数,以将结果与gmodels :: ci()分开,并改为调用它们,例如
lci <- function(data) {
as.numeric(ci(data)[2])
}
uci <- function(data) {
as.numeric(ci(data)[3])
}
dataset_aggregated <- dataset %>%
group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
summarise_if(is.numeric, funs(mean, lci, uci)) %>%
select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically