R data.table外部联接,约束为roll =“ nearest”

时间:2019-06-05 02:50:58

标签: r data.table

更新:我认为这主要是我想要的,但是现在我有多个匹配项。我想这是另一个问题。能够将滚动联接与非等联接合并在一起将是很好的。

df2[ copy(df1)[, `:=`(TargDate2 = TargDate + hours(4) , TargDate1 = TargDate -hours(4) )], 
     `:=`( Value = i.Value, TargDate.df1 = TargDate ), 
     on = .(ID == ID, TargDate >= TargDate1, TargDate <= TargDate2) ]

是否有一种方法可以使用data.table包中的滚动联接基于某个约束(例如4小时)内的最近日期时间值来匹配两个数据帧,但保留两个表的所有值(例如: merge(...,all = T))?

library(data.table)
library(lubridate)

set.seed(1)   
df1 <- data.frame(ID=sample(1:3,10, replace=T),TargDate=ymd_hms(Sys.time() + sort(sample(1e2:1e5, 10))), 
                  Value=rnorm(10,10,0.5) )

set.seed(21)   
df2 <- data.frame(ID=sample(1:3,20, replace=T), TargDate=ymd_hms(Sys.time() + sort(sample(1e2:1e5, 20))),
                  ValueMatch=rnorm(20,50,15) )

setDT(df1)
setDT(df2)

setkey(df2, ID, TargDate)[, dateMatch:=TargDate]
# This is an inner match to df1 with DateTarg and Value from df1
# and ValueMatch and dateMatch from df2
df2[df1, roll="nearest"]

# 60 seconds * 60 minutes * 4 hours
four_hours <- 60*60*4
df2[df1, roll=-four_hours]

一个数据帧,其中包含df1和df2中的所有行,并且合并了匹配的行。

1 个答案:

答案 0 :(得分:0)

这里是data.table的方法,用于连接<= 4小时的df2的行。在 df2的副本上使用非设备连接,其中已创建新的Targdate2(= TargetDate + 4小时)以进行非设备连接。

df1[ copy(df2)[, TargDate2 := TargDate + hours(4)], 
     `:=`( ValueMatch = i.ValueMatch, TargDate.df2 = TargDate ), 
     on = .(ID == ID, TargDate >= TargDate, TargDate <= TargDate2) ]


#    ID            TargDate     Value ValueMatch        TargDate.df2
# 1:  1 2019-06-05 13:32:48 10.755891         NA                <NA>
# 2:  2 2019-06-05 14:21:47 10.194922         NA                <NA>
# 3:  2 2019-06-05 19:11:32  9.689380         NA                <NA>
# 4:  3 2019-06-05 19:18:21  8.892650   46.59552 2019-06-05 17:56:47
# 5:  1 2019-06-05 22:27:28 10.562465         NA                <NA>
# 6:  3 2019-06-06 03:42:42  9.977533   22.48528 2019-06-06 03:12:42
# 7:  3 2019-06-06 04:33:36  9.991905   43.88468 2019-06-06 04:26:16
# 8:  2 2019-06-06 06:00:34 10.471918         NA                <NA>
# 9:  2 2019-06-06 06:13:10 10.410611   63.67443 2019-06-06 06:10:11
#10:  1 2019-06-06 12:10:15 10.296951   51.20187 2019-06-06 08:45:39