如何在RHS上使用变量列名:= operations?例如,鉴于此data.table" dt",我想创建两个新列," first_y"和" first_z"其中包含对#34; x"的值的给定列的第一次观察。
dt <- data.table(x = c("one","one","two","two","three"),
y = c("a", "b", "c", "d", "e"),
z = c(1, 2, 3, 4, 5))
dt
x y z
1: one a 1
2: one b 2
3: two c 3
4: two d 4
5: three e 5
以下是没有变量列名的方法。
dt[, c("first_y", "first_z") := .(first(y), first(z)), by = x]
dt
x y z first_y first_z
1: one a 1 a 1
2: one b 2 a 1
3: two c 3 c 3
4: two d 4 c 3
5: three e 5 e 5
但如果&#34; y&#34;我该如何做呢?和&#34; z&#34;列名称是否动态存储在变量?
中cols <- c("y", "z")
# This doesn't work
dt[, (paste0("first_", cols)) := .(first(cols)), by = x]
# Nor does this
q <- quote(first(as.name(cols[1])))
p <- quote(first(as.name(cols[2])))
dt[, (paste0("first_", cols)) := .(eval(q), eval(p)), by = x]
我尝试了很多其他的quote()和eval()以及as.name()的组合而没有成功。该操作的LHS似乎按预期工作,并在许多地方记录,但我无法找到有关在RHS上使用变量列名称的任何信息。提前谢谢。
答案 0 :(得分:5)
我不熟悉first
函数(虽然它看起来像是Hadley定义的东西)。
dt[, paste0("first_", cols) := lapply(.SD, head, n = 1L),
by = x, .SDcols = cols]
# x y z first_y first_z
#1: one a 1 a 1
#2: one b 2 a 1
#3: two c 3 c 3
#4: two d 4 c 3
#5: three e 5 e 5
答案 1 :(得分:4)
对于这种情况,.SDcols
答案很好,但您也可以使用get
:
dt[, paste0("first_", cols) := lapply(cols, function(x) get(x)[1]), by = x]
dt
# x y z first_y first_z
#1: one a 1 a 1
#2: one b 2 a 1
#3: two c 3 c 3
#4: two d 4 c 3
#5: three e 5 e 5
另一种选择是矢量化版本 - mget
:
dt[, paste0("first_", cols) := setDT(mget(cols))[1], by = x]