使用R data.table在15个月的范围内查找累积库存回报值

时间:2016-06-24 04:12:23

标签: r data.table stock

我试图计算一组美国公司15个月的股票回报率。我总是使用SAS,但我的SAS许可证已过期。

数据如下所示,我在月度回报中添加了1

返回数据(crsp.msf),源数据:

permno date      ret
10002 1994-01-31 1.039
10002 1994-02-28 0.991
10002 1994-03-31 1.005
10002 1994-04-29 0.943
10002 1994-05-31 1.060
10002 1994-06-30 1.061
10002 1994-07-29 0.946
10002 1994-08-31 1.009
10002 1994-09-30 0.977
10002 1994-10-31 1.000
10002 1994-11-30 0.962
10002 1994-12-30 1.056
10002 1995-01-31 1.000
10002 1995-02-28 1.000
10002 1995-03-31 0.978
10002 1995-04-28 1.020
10002 1995-05-31 1.038
10002 1995-06-30 0.969
10002 1995-07-31 1.000
10002 1995-08-31 1.000
10002 1995-09-29 1.122
10002 1995-10-31 0.862
10002 1995-11-30 1.070
10002 1995-12-29 1.053

对于这家公司10002,每个月,我想找到超过15个月的回报,如下面的列表(elist)所示:

permno,begdat,enddat
10002,1994-03-31,1995-06-30
10002,1994-06-30,1995-09-30
10002,1994-09-30,1995-12-31
10002,1994-12-31,1996-03-31
10002,1995-03-31,1996-06-30
10002,1995-06-30,1996-09-30
10002,1995-09-30,1996-12-31
10002,1995-12-31,1997-03-31

我有很多公司,所以'elist'有40000行。

任何帮助都会很棒。

3 个答案:

答案 0 :(得分:2)

假设您的数据已经存在于数据表中,您可以使用foverlaps函数:

# create a begindate ('bdat') and enddate ('edat') from the 'date' column
crsp.mrf[, `:=` (bdat = as.Date(date), edat = as.Date(date))][, date := NULL]
# convert the date columns in 'elist to Date format (only if they aren't already)
elist[, `:=` (begdat = as.Date(begdat), enddat = as.Date(enddat))]

# set the keys
setkey(crsp.mrf, permno, bdat, edat)
setkey(elist, permno, begdat, enddat)

# see which dates fall in the specified date-windows from 'elist' and calculate the sum for each window
foverlaps(crsp.mrf, elist, type = "within", nomatch=0L)[, .(sum.ret = sum(ret)), by = .(permno, begdat, enddat)]

给出:

   permno     begdat     enddat sum.ret
1:  10002 1994-03-31 1995-06-30  16.024
2:  10002 1994-06-30 1995-09-30  16.138
3:  10002 1994-09-30 1995-12-31  16.107
4:  10002 1994-12-31 1996-03-31  12.112
5:  10002 1995-03-31 1996-06-30  10.112
6:  10002 1995-06-30 1996-09-30   7.076
7:  10002 1995-09-30 1996-12-31   2.985

答案 1 :(得分:0)

如果您想使用data.table

dt[date %between% c("1994-03-31","1995-06-30")]

结果。

    permno       date   ret
 1:  10002 1994-03-31 1.005
 2:  10002 1994-04-29 0.943
 3:  10002 1994-05-31 1.060
 4:  10002 1994-06-30 1.061
 5:  10002 1994-07-29 0.946
 6:  10002 1994-08-31 1.009
 7:  10002 1994-09-30 0.977
 8:  10002 1994-10-31 1.000
 9:  10002 1994-11-30 0.962
10:  10002 1994-12-30 1.056
11:  10002 1995-01-31 1.000
12:  10002 1995-02-28 1.000
13:  10002 1995-03-31 0.978
14:  10002 1995-04-28 1.020
15:  10002 1995-05-31 1.038
16:  10002 1995-06-30 0.969

如果您想为整个elist执行此操作,您可以执行以下操作。首先使用read.table读取您的数据。

elist <- read.table(text="
permno,begdat,enddat
10002,1994-03-31,1995-06-30
10002,1994-06-30,1995-09-30
10002,1994-09-30,1995-12-31
10002,1994-12-31,1996-03-31
10002,1995-03-31,1996-06-30
10002,1995-06-30,1996-09-30
10002,1995-09-30,1996-12-31
10002,1995-12-31,1997-03-31", header=T, sep = ",", fill=TRUE,stringsAsFactors=FALSE)

然后使用简单的for loop

res <- NULL
for (i in 1:NROW(elist)){
  res <- rbind(res, dt[date %between% c(elist[i,2],elist[i,3])])
}

答案 2 :(得分:0)

谢谢助手!问题的MySql解决方案是:create table return1 as select a.*, b.ret, b.date from elist as a, crsp.msf as b where a.permno = b.permno and (b.date > a.begdat and b.date <= a.enddat)

然而,这需要2小时(!)才能产生所需的结果。

使用data.table

首先,定义一个函数,其中ricrsp.msfrowelist中的行号:

cumret <- function(ri,row){
    r<-ri[permno==elist[row,permno],]
    r<-r[date>elist[row,begdat] & date<=elist[row,enddat],.(ret)]
    r<-r[,.(prod(ret,na.rm = FALSE)-1)]
    return(r)
}

请注意,我使用两个步骤从ri检索相关的观察结果。我可以一步完成,但这需要太多时间。函数的倒数第二行计算累积回报。

其次,向returns添加elist列:

elist[, return := NA_real_]

最后,循环通过elist:

for (row in 1:elist[,.N]){ elist[row,return:=cumret(ri,row)] }

对于40k的观察,这花了大约2分钟。