如何使用data.table执行日期范围的连接?

时间:2014-12-15 15:53:26

标签: r time-series data.table

如何使用data.table执行以下操作(直接使用sqldf)并获得完全相同的结果:

library(data.table)

whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),
    origin="1970-01-01 00:00:00"),
    end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),
    x=1:10,
    y=letters[1:10])

measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),
    origin="1970-01-01 00:00:00"),
    temp=runif(2000, 10, 100))

## Alternative short names for data.tables
dt1 <- whatWasMeasured
dt2 <- measurments

## Straightforward with sqldf    
library(sqldf)

sqldf("select * from measurments m, whatWasMeasured wwm
where m.time between wwm.start and wwm.end")

1 个答案:

答案 0 :(得分:20)

您可以使用foverlaps()函数来有效地间隔实现连接。在您的情况下,我们只需要measurments的虚拟列。

  

注1:您应该安装data.table - v1.9.5的开发版本,因为foverlaps()已修复了whatWasMeasured的错误。您可以找到安装说明here

     

注2:为方便起见,我会在此处致电dt1 = measurmentsdt2 = require(data.table) ## 1.9.5+ dt2[, dummy := time] setkey(dt1, start, end) ans = foverlaps(dt2, dt1, by.x=c("time", "dummy"), nomatch=0L)[, dummy := NULL]

?foverlaps

有关详细信息,请参阅{{1}};有关效果比较,请参阅this post