我正在尝试使用foverlaps进行合并,这是我设置的一部分,以便对各个子集运行滚动回归。
设置如下: DT1包括两个唯一的标识符(gvkey,iid),设置投资组合再平衡日期(RDt)和一堆基于先前计算/排名的子集虚拟变量。
DT2包含每个gvkey和iid组合的每日股票回报的完整列表
DT1 <- data.table(gvkey = rep(LETTERS[1:20],20),
RDt = rep(ymd("1998-06-30") + years(0:19),each=20),
indicator_1 = sample(0:1,400,replace = TRUE),
indicator_k = sample(0:1,400,replace = TRUE))
DT2 <- data.table(gvkey = rep(LETTERS[1:20],20,length.out=7275),
datadate = seq(ymd("1998-01-30"), ymd("2017-12-30"), by="days"),
log_ret = rlnorm(7275))
#Note for laziness I have left weekends in and have excluded second identifier (iid)
我想基于RDt的6个月窗口将DT1中的所有虚拟变量复制到DT2中,因此,给定:
DT1[gvkey=="A" & RDt =="1998-06-30"]
gvkey RDt indicator_1 indicator_k
1: A 1998-06-30 1 0
合并表(DT3)的子集:
DT3[gvkey=="A" & between(datadate,"1998-01-01", "1998-06-30")]
结果类似于:
DT3 <- data.table(gvkey = rep("A",181), "datadate" = seq(ymd("1998-01-01"), ymd("1998-06-30"), by="days"), log_ret = rlnorm(181),indicator_1=1, indicator_k=0 )
#(only pre-event window is included here)
DT1[, ":=" ('reg_pre_start' = Last.Day(RDt - month(6)),
'reg_pre_end' = Last.Day(RDt - month(1)),
'RDt_Dummy' = RDt)]
DT2[, 'Datadate_Dummy' := datadate] #Daily dates
setkey(DT1, gvkey, iid, reg_pre_start, reg_pre_end) #6 month interval
setkey(DT2, gvkey, iid, datadate, Datadate_Dummy) #Zero day interval
DT3 <- foverlaps(DT1, DT2, type='within', nomatch=0L)
我还尝试了以下联接:
DT3 <- DT1[DT2, on = c("gvkey", "iid", "reg_pre_start<=datadate", "reg_pre_end>=datadate"), .SD, by = .EACHI, nomatch = 0, allow.cartesian = TRUE]
这两个结果仅在RDt = datadate的几天出现在合并表上
我怀疑问题与非传统表格中设置的间隔有关