在for循环中使用“ i”将功能应用于某些列

时间:2019-06-02 12:55:27

标签: r for-loop data.table

我正在使用for(i in cols)遍历列表cols = c(“ x”,“ y”,“ z”),但是:

  1. 使用“:=”创建新列时,我无法获得“ i”作为列名
  2. 我创建了mode_func以获取向量中最频繁出现的字符串,但是当我使用lapply时,“ i”似乎没有充当列。

有人可以帮助我理解for循环中“ i”的问题和动态吗?非常感谢!

set.seed(10)
dummy = data.table(id = c("11", "11", "11", "22", "22", "22", "33", "33", "33", "33"),
                   x = sample(c("a", "b", "c"), 10, replace = T),
                   y = sample(c("a", "b", "c"), 10, replace = T),
                   z = sample(c("a", "b", "c"), 10, replace = T),
                   i = sample(3, 10, replace = T),
                   j = sample(3, 10, replace = T),
                   k = sample(3, 10, replace = T))
mode_func <- function(x) {
  uniqx <- unique(na.omit(x))
  uniqx[which.max(tabulate(match(x, uniqx)))]
}

(1)最常见

cols = c("x", "y", "z")
for (i in cols){
  dummy[, as.character(i) := mode_func(i), by = "id"]
}

# The following works but it's too much coding!
dummy[, x := mode_func(x), by = "id"]
dummy[, y := mode_func(y), by = "id"]
dummy[, z := mode_func(z), by = "id"]

预期结果如下:

    id x y z
 1: 11 b b c
 2: 11 b b c
 3: 11 b b c
 4: 22 a b b
 5: 22 a b b
 6: 22 a b b
 7: 33 a a c
 8: 33 a a c
 9: 33 a a c
10: 33 a a c

(2)我也尝试了平均值,这对我不起作用:

cols = c("i", "j", "k")
dummy[, (cols) := lapply(.SD, function(x) round(mean(x, na.rm = T))), .SDcols = cols, by = "id"]

2 个答案:

答案 0 :(得分:2)

您可以使用mode_func直接在cols上致电lapply

library(data.table)
dummy[, (cols) := lapply(.SD, mode_func), by = "id"]

dummy
#    id x y z
# 1: 11 b b c
# 2: 11 b b c
# 3: 11 b b c
# 4: 22 a b b
# 5: 22 a b b
# 6: 22 a b b
# 7: 33 a a c
# 8: 33 a a c
# 9: 33 a a c
#10: 33 a a c

就运行for循环而言,要为每个列分别调用mode_func函数,您需要使用.SDcols子集该特定列并传递{{1} }值作为每次迭代的函数输入。 (感谢@David Arenburg的评论)

.SD

答案 1 :(得分:1)

我们可以使用let single: {name: string, value: number}[] = []; 中的mutate_at

dplyr