假设我有这两个data.tables:
A <- data.table(date = c("2003-05-24", "2003-06-05", "2003-06-24", "2003-06-25", "2003-06-27"),
"id" = c(1,2,1,1,2))
B <- data.table(idd = c(1,1,1,1,1),
datee = c("2003-05-25", "2003-06-06", "2003-06-25", "2003-06-26", "2003-06-28"),
value = c(1,2,3,4,5))
> A
date id
1: 2003-05-24 1
2: 2003-06-05 2
3: 2003-06-24 1
4: 2003-06-25 1
5: 2003-06-27 2
> B
idd datee value
1: 1 2003-05-25 1
2: 1 2003-06-06 2
3: 1 2003-06-25 3
4: 1 2003-06-26 4
5: 1 2003-06-28 5
对于A中的每个id,我想加入B中最接近(基于日期)的先前值。这给出了所需的结果:
A[B, value := i.value, on = c("id" = "idd", "date" = "datee"), roll=-Inf]
> A
date id value
1: 2003-05-24 1 NA
2: 2003-06-05 2 NA
3: 2003-06-24 1 2
4: 2003-06-25 1 3
5: 2003-06-27 2 NA
问题是,我在B中不只有一列,而是几百列。我真的不想输入所有列名,如valueXXX = i.valueXXX等,特别是因为B中的列数和名称可能会改变。
所以我尝试像这样滚动连接:
C <- A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf]
> C
date id value
1: 2003-05-25 1 1
2: 2003-06-06 1 2
3: 2003-06-25 1 3
4: 2003-06-26 1 4
5: 2003-06-28 1 5
正如您所看到的,结果根本不是我想要的。有人可以向我解释一下,为什么data.table表现得像这样? 另外,在没有对所有列名称进行硬编码的情况下,实现所需结果的正确方法是什么?
编辑:弗兰克提供的链接确实解决了我的问题。基本上定义要添加的变量的向量,然后使用“:=”与mget:
vars <- c("value") # in my case hundreds of variables, but in this toy example just one
A[B, (vars) := mget(paste0("i.", vars)), on = c("id" = "idd", "date" = "datee"), roll=-Inf]