小间隔在错误的表中时如何应用保险杠

时间:2019-04-29 00:40:38

标签: r join data.table overlap

我正在尝试使用foverlaps进行合并,这是我设置的一部分,以便对各个子集运行滚动回归。

设置

设置如下: DT1包括两个唯一的标识符(gvkey,iid),设置投资组合再平衡日期(RDt)和一堆基于先前计算/排名的子集虚拟变量。

DT2包含每个gvkey和iid组合的每日股票回报的完整列表

DT1 <- data.table(gvkey = rep(LETTERS[1:20],20), 
                  RDt = rep(ymd("1998-06-30") + years(0:19),each=20), 
                  indicator_1 = sample(0:1,400,replace = TRUE), 
                  indicator_k = sample(0:1,400,replace = TRUE))
DT2 <- data.table(gvkey = rep(LETTERS[1:20],20,length.out=7275),
                  datadate = seq(ymd("1998-01-30"), ymd("2017-12-30"), by="days"), 
                  log_ret = rlnorm(7275))
#Note for laziness I have left weekends in and have excluded second identifier (iid) 

我想基于RDt的6个月窗口将DT1中的所有虚拟变量复制到DT2中,因此,给定:

DT1[gvkey=="A" & RDt =="1998-06-30"]

   gvkey        RDt indicator_1 indicator_k
1:     A 1998-06-30           1           0

合并表(DT3)的子集:

DT3[gvkey=="A" & between(datadate,"1998-01-01", "1998-06-30")]

结果类似于:

DT3 <- data.table(gvkey = rep("A",181), "datadate" = seq(ymd("1998-01-01"), ymd("1998-06-30"), by="days"), log_ret = rlnorm(181),indicator_1=1, indicator_k=0 )

为达到这个目的,我尝试了以下方法:

尝试一个:

#(only pre-event window is included here)
DT1[, ":=" ('reg_pre_start' = Last.Day(RDt - month(6)),
            'reg_pre_end' = Last.Day(RDt - month(1)),
            'RDt_Dummy' = RDt)]
DT2[, 'Datadate_Dummy' := datadate] #Daily dates

setkey(DT1, gvkey, iid, reg_pre_start, reg_pre_end) #6 month interval
setkey(DT2, gvkey, iid, datadate, Datadate_Dummy) #Zero day interval


DT3 <- foverlaps(DT1, DT2, type='within', nomatch=0L)

尝试两个:

我还尝试了以下联接:

DT3 <- DT1[DT2, on = c("gvkey", "iid", "reg_pre_start<=datadate", "reg_pre_end>=datadate"), .SD, by = .EACHI, nomatch = 0, allow.cartesian = TRUE]

这两个结果仅在RDt = datadate的几天出现在合并表上

我怀疑问题与非传统表格中设置的间隔有关

0 个答案:

没有答案