R data.table滚动连接

时间:2018-04-23 16:21:58

标签: r join data.table rolling-computation

假设我有这两个data.tables:

  A <- data.table(date = c("2003-05-24", "2003-06-05", "2003-06-24", "2003-06-25", "2003-06-27"),
                  "id" = c(1,2,1,1,2))

  B <- data.table(idd = c(1,1,1,1,1),
                  datee =  c("2003-05-25", "2003-06-06", "2003-06-25", "2003-06-26", "2003-06-28"),
                  value = c(1,2,3,4,5))
> A
         date id
1: 2003-05-24  1
2: 2003-06-05  2
3: 2003-06-24  1
4: 2003-06-25  1
5: 2003-06-27  2

> B
   idd      datee value
1:   1 2003-05-25     1
2:   1 2003-06-06     2
3:   1 2003-06-25     3
4:   1 2003-06-26     4
5:   1 2003-06-28     5

对于A中的每个id,我想加入B中最接近(基于日期)的先前值。这给出了所需的结果:

A[B, value := i.value, on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> A
         date id value
1: 2003-05-24  1    NA
2: 2003-06-05  2    NA
3: 2003-06-24  1     2
4: 2003-06-25  1     3
5: 2003-06-27  2    NA

问题是,我在B中不只有一列,而是几百列。我真的不想输入所有列名,如valueXXX = i.valueXXX等,特别是因为B中的列数和名称可能会改变。

所以我尝试像这样滚动连接:

C <- A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> C
         date id value
1: 2003-05-25  1     1
2: 2003-06-06  1     2
3: 2003-06-25  1     3
4: 2003-06-26  1     4
5: 2003-06-28  1     5

正如您所看到的,结果根本不是我想要的。有人可以向我解释一下,为什么data.table表现得像这样? 另外,在没有对所有列名称进行硬编码的情况下,实现所需结果的正确方法是什么?

编辑:弗兰克提供的链接确实解决了我的问题。基本上定义要添加的变量的向量,然后使用“:=”与mget:

vars <- c("value")  # in my case hundreds of variables, but in this toy example just one

A[B, (vars) := mget(paste0("i.", vars)), on = c("id" = "idd", "date" = "datee"), roll=-Inf]

0 个答案:

没有答案