我试图计算一组美国公司15个月的股票回报率。我总是使用SAS,但我的SAS许可证已过期。
数据如下所示,我在月度回报中添加了1
返回数据(crsp.msf),源数据:
permno date ret
10002 1994-01-31 1.039
10002 1994-02-28 0.991
10002 1994-03-31 1.005
10002 1994-04-29 0.943
10002 1994-05-31 1.060
10002 1994-06-30 1.061
10002 1994-07-29 0.946
10002 1994-08-31 1.009
10002 1994-09-30 0.977
10002 1994-10-31 1.000
10002 1994-11-30 0.962
10002 1994-12-30 1.056
10002 1995-01-31 1.000
10002 1995-02-28 1.000
10002 1995-03-31 0.978
10002 1995-04-28 1.020
10002 1995-05-31 1.038
10002 1995-06-30 0.969
10002 1995-07-31 1.000
10002 1995-08-31 1.000
10002 1995-09-29 1.122
10002 1995-10-31 0.862
10002 1995-11-30 1.070
10002 1995-12-29 1.053
对于这家公司10002,每个月,我想找到超过15个月的回报,如下面的列表(elist)所示:
permno,begdat,enddat
10002,1994-03-31,1995-06-30
10002,1994-06-30,1995-09-30
10002,1994-09-30,1995-12-31
10002,1994-12-31,1996-03-31
10002,1995-03-31,1996-06-30
10002,1995-06-30,1996-09-30
10002,1995-09-30,1996-12-31
10002,1995-12-31,1997-03-31
我有很多公司,所以'elist'有40000行。
任何帮助都会很棒。
答案 0 :(得分:2)
假设您的数据已经存在于数据表中,您可以使用foverlaps
函数:
# create a begindate ('bdat') and enddate ('edat') from the 'date' column
crsp.mrf[, `:=` (bdat = as.Date(date), edat = as.Date(date))][, date := NULL]
# convert the date columns in 'elist to Date format (only if they aren't already)
elist[, `:=` (begdat = as.Date(begdat), enddat = as.Date(enddat))]
# set the keys
setkey(crsp.mrf, permno, bdat, edat)
setkey(elist, permno, begdat, enddat)
# see which dates fall in the specified date-windows from 'elist' and calculate the sum for each window
foverlaps(crsp.mrf, elist, type = "within", nomatch=0L)[, .(sum.ret = sum(ret)), by = .(permno, begdat, enddat)]
给出:
permno begdat enddat sum.ret
1: 10002 1994-03-31 1995-06-30 16.024
2: 10002 1994-06-30 1995-09-30 16.138
3: 10002 1994-09-30 1995-12-31 16.107
4: 10002 1994-12-31 1996-03-31 12.112
5: 10002 1995-03-31 1996-06-30 10.112
6: 10002 1995-06-30 1996-09-30 7.076
7: 10002 1995-09-30 1996-12-31 2.985
答案 1 :(得分:0)
如果您想使用data.table
。
dt[date %between% c("1994-03-31","1995-06-30")]
结果。
permno date ret
1: 10002 1994-03-31 1.005
2: 10002 1994-04-29 0.943
3: 10002 1994-05-31 1.060
4: 10002 1994-06-30 1.061
5: 10002 1994-07-29 0.946
6: 10002 1994-08-31 1.009
7: 10002 1994-09-30 0.977
8: 10002 1994-10-31 1.000
9: 10002 1994-11-30 0.962
10: 10002 1994-12-30 1.056
11: 10002 1995-01-31 1.000
12: 10002 1995-02-28 1.000
13: 10002 1995-03-31 0.978
14: 10002 1995-04-28 1.020
15: 10002 1995-05-31 1.038
16: 10002 1995-06-30 0.969
如果您想为整个elist执行此操作,您可以执行以下操作。首先使用read.table
读取您的数据。
elist <- read.table(text="
permno,begdat,enddat
10002,1994-03-31,1995-06-30
10002,1994-06-30,1995-09-30
10002,1994-09-30,1995-12-31
10002,1994-12-31,1996-03-31
10002,1995-03-31,1996-06-30
10002,1995-06-30,1996-09-30
10002,1995-09-30,1996-12-31
10002,1995-12-31,1997-03-31", header=T, sep = ",", fill=TRUE,stringsAsFactors=FALSE)
然后使用简单的for loop
。
res <- NULL
for (i in 1:NROW(elist)){
res <- rbind(res, dt[date %between% c(elist[i,2],elist[i,3])])
}
答案 2 :(得分:0)
谢谢助手!问题的MySql解决方案是:create table return1 as select a.*, b.ret, b.date from elist as a, crsp.msf as b where a.permno = b.permno and (b.date > a.begdat and b.date <= a.enddat)
然而,这需要2小时(!)才能产生所需的结果。
使用data.table
首先,定义一个函数,其中ri
为crsp.msf
,row
为elist
中的行号:
cumret <- function(ri,row){
r<-ri[permno==elist[row,permno],]
r<-r[date>elist[row,begdat] & date<=elist[row,enddat],.(ret)]
r<-r[,.(prod(ret,na.rm = FALSE)-1)]
return(r)
}
请注意,我使用两个步骤从ri
检索相关的观察结果。我可以一步完成,但这需要太多时间。函数的倒数第二行计算累积回报。
其次,向returns
添加elist
列:
elist[, return := NA_real_]
最后,循环通过elist:
for (row in 1:elist[,.N]){
elist[row,return:=cumret(ri,row)]
}
对于40k的观察,这花了大约2分钟。