我正在使用for(i in cols)遍历列表cols = c(“ x”,“ y”,“ z”),但是:
有人可以帮助我理解for循环中“ i”的问题和动态吗?非常感谢!
set.seed(10)
dummy = data.table(id = c("11", "11", "11", "22", "22", "22", "33", "33", "33", "33"),
x = sample(c("a", "b", "c"), 10, replace = T),
y = sample(c("a", "b", "c"), 10, replace = T),
z = sample(c("a", "b", "c"), 10, replace = T),
i = sample(3, 10, replace = T),
j = sample(3, 10, replace = T),
k = sample(3, 10, replace = T))
mode_func <- function(x) {
uniqx <- unique(na.omit(x))
uniqx[which.max(tabulate(match(x, uniqx)))]
}
(1)最常见
cols = c("x", "y", "z")
for (i in cols){
dummy[, as.character(i) := mode_func(i), by = "id"]
}
# The following works but it's too much coding!
dummy[, x := mode_func(x), by = "id"]
dummy[, y := mode_func(y), by = "id"]
dummy[, z := mode_func(z), by = "id"]
预期结果如下:
id x y z
1: 11 b b c
2: 11 b b c
3: 11 b b c
4: 22 a b b
5: 22 a b b
6: 22 a b b
7: 33 a a c
8: 33 a a c
9: 33 a a c
10: 33 a a c
(2)我也尝试了平均值,这对我不起作用:
cols = c("i", "j", "k")
dummy[, (cols) := lapply(.SD, function(x) round(mean(x, na.rm = T))), .SDcols = cols, by = "id"]
答案 0 :(得分:2)
您可以使用mode_func
直接在cols
上致电lapply
library(data.table)
dummy[, (cols) := lapply(.SD, mode_func), by = "id"]
dummy
# id x y z
# 1: 11 b b c
# 2: 11 b b c
# 3: 11 b b c
# 4: 22 a b b
# 5: 22 a b b
# 6: 22 a b b
# 7: 33 a a c
# 8: 33 a a c
# 9: 33 a a c
#10: 33 a a c
就运行for
循环而言,要为每个列分别调用mode_func
函数,您需要使用.SDcols
子集该特定列并传递{{1} }值作为每次迭代的函数输入。 (感谢@David Arenburg的评论)
.SD
答案 1 :(得分:1)
我们可以使用let single: {name: string, value: number}[] = [];
中的mutate_at
dplyr