我有一个包含四个级别的数据集:观测值(即观察老师的时间段),老师,学校,学校部门。观察结果嵌套在教师中,教师嵌套在学校中,等等。
数据中的每一行都对应一个观察老师的实例。
在层次结构的每个级别,我想为几个变量(mean
中的每个变量分别计算sd
,min
,max
和x1
,x2
和x3
,但实际数据中有〜12)。我希望所有这些摘要都在一个数据框中。
下面的代码可以做到,但对我来说却很笨拙。更具体地说,一些困扰我的事情是:
rename
值编写的函数中进行group_var
的操作,因此我不得不在函数之外手动进行此操作。left_join
将它们最后结合在一起(再次手动)。purrr
中的某些东西来“剥离”层次结构和聚合层,但这使我难以理解。任何有关如何简化此操作的建议,尤其是如何将group_var
值传递给rename_at
的建议,将不胜感激!
library(tidyverse)
library(treemap)
df <- random.hierarchical.data(n = 200, depth = 4) %>%
rename(div = index1,
sch = index2,
teacher = index3,
obs = index4,
x1 = x) %>%
mutate(x2 = rlnorm(200),
x3 = rlnorm(200))
sum_func <- function(data, sum_vars, ...) {
group_vars <- enquos(...)
data %>%
group_by(!!!group_vars) %>%
summarize_at(vars(sum_vars),
list(
~mean(., na.rm = TRUE),
~sd(., na.rm = TRUE),
~min(., na.rm = TRUE),
~max(., na.rm = TRUE)
)) %>%
ungroup()
}
use_vars <- c("x1", "x2", "x3")
teacher_sum <- sum_func(data = df, sum_vars = use_vars, div, sch, teacher) %>%
rename_at(vars(-c("teacher", "sch", "div")), ~str_replace_all(., "^", "teacher_"))
sch_sum <- sum_func(df, sum_vars = use_vars, div, sch) %>%
rename_at(vars(-c("sch", "div")), ~str_replace_all(., "^", "sch_"))
div_sum <- sum_func(df, sum_vars = use_vars, div) %>%
rename_at(vars(-c("div")), ~str_replace_all(., "^", "div_"))
full <- teacher_sum %>%
left_join(sch_sum, by = c("sch", "div")) %>%
left_join(div_sum, by = "div")
答案 0 :(得分:2)
您已经很近了。下面的代码有效,但我不确定如何完全自动化联接,因为逻辑不明确
sum_func <- function(data, sum_vars, replacement, ...) {
group_vars <- enquos(...)
data %>%
group_by(!!!group_vars) %>%
summarize_at(vars(sum_vars),
list(
~mean(., na.rm = TRUE),
~sd(., na.rm = TRUE),
~min(., na.rm = TRUE),
~max(., na.rm = TRUE)
)) %>%
ungroup() %>%
rename_at(vars(-c(!!!group_vars)),
~str_replace_all(., "^", replacement))
}
use_vars <- c("x1", "x2", "x3")
teacher_sum <- sum_func(data = df,
sum_vars = use_vars,
replacement = "teacher_",
div, sch, teacher)
sch_sum <- sum_func(data = df,
sum_vars = use_vars,
replacement = "sch_",
div, sch)
div_sum <- sum_func(df,
sum_vars = use_vars,
replacement = "div_",
div)