如何执行rolljoin到按ID分组的最近时间戳?

时间:2018-03-23 01:26:59

标签: r dplyr data.table

我有两个数据表,如下所示

DT1

库(data.table) 库(lubridate)

    DT1<-data.frame(
                    id=c(7,7,7,3,3,3),
                    start_time=c("2017-11-01 08:37:35","2017-11-01 09:07:44","2017-11-01 09:46:16","2017-11-01 10:32:29","2017-11-01 10:59:25","2017-11-01 13:24:12"),
                    cube=c(628,625,469,711,376,628)
                    )
DT1=data.table(DT1)


             id        start_time           cube
1:           7 2017-11-01 08:37:35          628
2:           7 2017-11-01 09:07:44          625
3:           7 2017-11-01 09:46:16          469
4:           3 2017-11-01 10:32:29          711
5:           3 2017-11-01 10:59:25          376
6:           3 2017-11-01 13:24:12          628

DT2

DT2<-data.frame(
  id=c(7,7,7,3,3,3),
  res_time=c("2017-11-01 08:35:30","2017-11-01 09:07:48","2017-11-01 09:46:32","2017-11-01 10:31:29","2017-11-01 10:57:25","2017-11-01 13:22:10"),
  res_cube=c(309,625,469,712,375,630)
)
DT2=data.table(DT2)

             id        res_time           res_cube
1:           7 2017-11-01 08:35:30          309
2:           7 2017-11-01 09:07:48          625
3:           7 2017-11-01 09:46:32          469
4:           3 2017-11-01 10:31:29          712
5:           3 2017-11-01 10:57:25          375
6:           3 2017-11-01 13:22:10          630

从这两个我需要加入最接近的res_time DT1行并关联所有DT1行,并附加最接近的res_timeres_cube以及res_time对于每个id组。所以我试过

DT1 = DT1[,start_time := as.character(start_time)]
DT2 = DT2[,res_time := as.character(res_time)]


DT1 = DT1[,start_time := parse_date_time2(start_time,orders="YmdHMS",tz="NA")]
DT2 = DT2[,res_time := parse_date_time2(res_time,orders="YmdHMS",tz="NA")]


setkeyv(DT1, c("id","start_time"))    
setkeyv(DT2, c("id","res_time"))    
ans = DT1[DT2, roll=Inf]

但这给了这样的东西

              id    start_time              cube         res_cube
1:           7 2017-11-01 08:35:30           NA             309
2:           7 2017-11-01 09:07:48           NA             625
3:           7 2017-11-01 09:46:32           NA             469
4:           3 2017-11-01 10:31:29           NA             712
5:           3 2017-11-01 10:57:25           NA             375
6:           3 2017-11-01 13:22:10           NA             630

我没有得到res_time,我认为我在滚动连接中做错了。

我还注意到我在连接结果的start_time列中获得了res_time。

感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

我设法通过前滚获得了结果 而不是滚动落后

setDT(DT1)            
setDT(DT2)            
DT1 = DT1[,start_time := parse_date_time2(start_time,orders="YmdHMS",tz="NA")]
DT2 = DT2[,res_time := parse_date_time2(res_time,orders="YmdHMS",tz="NA")]

DT1 [,time:=start_time]
DT2[,time:=res_time]

setkey(DT1 ,id,time)    
setkey(DT2,id,time)    

ans = DT2[DT1, roll=T]# perform a Rolling Forward where Each start_time in DT1 is matched to the closest res_time in DT2.

<强> ANS

   id            res_time cube                time          start_time i.cube
1:  3 2017-11-01 10:31:29  712 2017-11-01 10:32:29 2017-11-01 10:32:29    711
2:  3 2017-11-01 10:57:25  375 2017-11-01 10:59:25 2017-11-01 10:59:25    376
3:  3 2017-11-01 13:22:10  630 2017-11-01 13:24:12 2017-11-01 13:24:12    628
4:  7 2017-11-01 08:35:30  309 2017-11-01 08:37:35 2017-11-01 08:37:35    628
5:  7 2017-11-01 08:35:30  309 2017-11-01 09:07:44 2017-11-01 09:07:44    625
6:  7 2017-11-01 09:07:48  625 2017-11-01 09:46:16 2017-11-01 09:46:16    469

希望有所帮助