我有一个包含220个变量(列)的数据集。我的数据看起来像这样
test var1 var2 var3
1 Pretest No Sometimes No
2 Postest No Sometimes No
3 Pretest No Yes No
4 Postest No Yes No
5 Pretest No Sometimes No
6 Postest No Yes No
7 Pretest No Yes No
8 Postest No No No
9 Pretest No Yes No
10 Postest No Sometimes No
11 Pretest No Sometimes No
12 Postest No No No
13 Pretest Yes Yes No
14 Postest No Sometimes No
15 Pretest No No No
16 Postest No Sometimes No
17 Pretest No Sometimes No
18 Postest No No No
19 Pretest No Yes No
20 Postest No Yes No
对于每个变量,我想按组生成相对频率表。为此,我正在使用dplyr
propvar1 <- dataprepost %>%
group_by(test,var1) %>%
summarise(n = n()) %>%
mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
例如,这给了我想要的Var1表:
test var1 n rel.freq
1 Postest Sometimes 120 14%
2 Postest No 667 79%
3 Postest Yes 44 5%
4 Postest NA 10 1%
5 Pretest Sometimes 155 18%
6 Pretest No 623 74%
7 Pretest Yes 49 6%
8 Pretest NA 14 2%
我该如何创建一个函数来对数据集中的每个变量自动执行此操作,所以最终为每个变量创建一个相似的表?
答案 0 :(得分:2)
这不是最优雅的代码...但是可以!
d <- structure(list(test = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Postest",
"Pretest"), class = "factor"), var1 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("No", "Yes"), class = "factor"), var2 = structure(c(2L,
2L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 2L,
1L, 3L, 3L), .Label = c("No", "Sometimes", "Yes"), class = "factor"),
var3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "No", class = "factor")), .Names = c("test",
"var1", "var2", "var3"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))`
output <- NULL
for (t_cat in unique(d$test)) {
x <- d[d$test == t_cat, ]
x$test <- NULL
l <- lapply(x, function(x) {
output <- data.frame(table(x))
colnames(output) <- c('val', 'n')
output$rel.freq <- output$n/length(x)
output$category <- t_cat
return(output)
})
r <- do.call("rbind", lapply(names(l), function(x) { cbind(l[[x]], 'var' = x)}))
output <- rbind(output, r)
}
print(output)
# val n rel.freq category var
# 1 No 9 0.9 Pretest var1
# 2 Yes 1 0.1 Pretest var1
# 3 No 1 0.1 Pretest var2
# 4 Sometimes 4 0.4 Pretest var2
# 5 Yes 5 0.5 Pretest var2
# 6 No 10 1.0 Pretest var3
# 7 No 10 1.0 Postest var1
# 8 Yes 0 0.0 Postest var1
# 9 No 3 0.3 Postest var2
# 10 Sometimes 4 0.4 Postest var2
# 11 Yes 3 0.3 Postest var2
# 12 No 10 1.0 Postest var3