我有一个data.frame DF,下面给出了4列。
DF <- structure(list(Ticker = c("ABC", "ABC", "ABC", "ABC","ABC","ABC","ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC","ABC","XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ","XYZ", "XYZ", "XYZ", "XYZ"), `Enter Date` = c("2005-02-08", "2005-02-23","2005-06-07", "2005-06-08", "2005-08-16", "2005-09-07", "2005-11-15","2005-11-17", "2005-12-06", "2005-12-23", "2006-02-09", "2006-02-10","2006-02-15", "2006-02-22", "2006-05-01", "2005-02-22", "2005-02-28","2005-03-01", "2005-03-03", "2005-03-04", "2005-03-11", "2005-03-15","2005-04-04", "2005-04-05", "2005-04-15", "2005-04-22", "2005-04-28"), `Exit Date` = c("2005-03-09", "2005-03-23", "2005-07-06","2005-07-07", "2005-09-14", "2005-10-05", "2005-12-14", "2005-12-16","2006-01-05", "2006-01-25", "2006-03-10", "2006-03-13", "2006-03-16","2006-03-22", "2006-05-30", "2005-03-22", "2005-03-29", "2005-03-30","2005-04-01", "2005-04-04", "2005-04-11", "2005-04-13", "2005-05-02","2005-05-03", "2005-05-13", "2005-05-20","2005-05-26"), Return = c(4.669,4.034, 3.796, -4.059, -11.168, -0.496,-3.597, 3.45, -4.428,1.914, 3.577, 4, 8.451, 5.521, 10.324, 3.104, 0.787,-3.407,-1.441, -4.157, 4.343, 2.827, 0.425, -1.37, -3.175, -11.027,8.144)), .Names = c("Ticker", "Enter Date", "Exit Date", "Return"), row.names = c(NA, 27L), class = "data.frame")
我想计算“返回”列的累积平均值,其中“输入日期”&gt; “退出日期”表示唯一的输入日期和每个代码。我可以通过两个步骤以data.frame的方式完成它。我使用的代码是
calCumAve <- function(data,yvar,nSkip)
{
nrs <- seq_len(nrow(data))
CumAve <- c(rep(NA,nSkip),sapply(nrs[nrs>nSkip],
FUN=function(t){mean(data[data$"Enter Date"[t]> data$"Exit Date", yvar])}))
return(CumAve)
}
DFOut <- do.call(rbind,lapply(sort(unique(DF$Ticker)), FUN=function(s){
sd <- DF[DF$Ticker==s,]
sd$AvgRet <- calCumAve(data=sd,yvar="Return",nSkip=4)
return(sd)}))
所需的输出是DFOut。
我想以 data.table 方式执行此操作。在data.table中应用时,我面临的主要问题是使用两个日期列来设置Return
列。几件事情要考虑:
(1)实际上将有1000个代码(在本例中只有2个,ABC和XYZ)和超过10年的每日数据。
(2)在不指定nSkip
的情况下执行操作。对于Enter Date
&lt; = Exit Date
,它应该给出NA(不是在DFOut的第20:22行中的NaN)
(3)如果可能,在子设置data.table中使用列名。给定的示例有四列,但工作data.table将有超过25列,我需要通过更改yvar
在多个列上应用相同的计算。
非常感谢任何帮助。提前谢谢。
答案 0 :(得分:2)
dt = as.data.table(DF) # or setDT to convert in place
# cumulative mean, but without the date restriction
dt[, rawAvgRets := cumsum(Return) / (1:.N), by = Ticker]
# find the latest matching date using a rolling merge (assumes sorted dates)
# if you run into > vs >= issues, adjust enter or exit date by a day
dt[, avgRets := dt[dt, rawAvgRets, roll = TRUE,
on = c('Ticker' = 'Ticker', 'Exit Date' = 'Enter Date')]]
# Ticker Enter Date Exit Date Return rawAvgRets avgRets
# 1: ABC 2005-02-08 2005-03-09 4.669 4.6690000 NA
# 2: ABC 2005-02-23 2005-03-23 4.034 4.3515000 NA
# 3: ABC 2005-06-07 2005-07-06 3.796 4.1663333 4.3515000
# 4: ABC 2005-06-08 2005-07-07 -4.059 2.1100000 4.3515000
# 5: ABC 2005-08-16 2005-09-14 -11.168 -0.5456000 2.1100000
# 6: ABC 2005-09-07 2005-10-05 -0.496 -0.5373333 2.1100000
# 7: ABC 2005-11-15 2005-12-14 -3.597 -0.9744286 -0.5373333
# 8: ABC 2005-11-17 2005-12-16 3.450 -0.4213750 -0.5373333
# 9: ABC 2005-12-06 2006-01-05 -4.428 -0.8665556 -0.5373333
#10: ABC 2005-12-23 2006-01-25 1.914 -0.5885000 -0.4213750
#11: ABC 2006-02-09 2006-03-10 3.577 -0.2098182 -0.5885000
#12: ABC 2006-02-10 2006-03-13 4.000 0.1410000 -0.5885000
#13: ABC 2006-02-15 2006-03-16 8.451 0.7802308 -0.5885000
#14: ABC 2006-02-22 2006-03-22 5.521 1.1188571 -0.5885000
#15: ABC 2006-05-01 2006-05-30 10.324 1.7325333 1.1188571
#16: XYZ 2005-02-22 2005-03-22 3.104 3.1040000 NA
#17: XYZ 2005-02-28 2005-03-29 0.787 1.9455000 NA
#18: XYZ 2005-03-01 2005-03-30 -3.407 0.1613333 NA
#19: XYZ 2005-03-03 2005-04-01 -1.441 -0.2392500 NA
#20: XYZ 2005-03-04 2005-04-04 -4.157 -1.0228000 NA
#21: XYZ 2005-03-11 2005-04-11 4.343 -0.1285000 NA
#22: XYZ 2005-03-15 2005-04-13 2.827 0.2937143 NA
#23: XYZ 2005-04-04 2005-05-02 0.425 0.3101250 -1.0228000
#24: XYZ 2005-04-05 2005-05-03 -1.370 0.1234444 -1.0228000
#25: XYZ 2005-04-15 2005-05-13 -3.175 -0.2064000 0.2937143
#26: XYZ 2005-04-22 2005-05-20 -11.027 -1.1900909 0.2937143
#27: XYZ 2005-04-28 2005-05-26 8.144 -0.4122500 0.2937143
# Ticker Enter Date Exit Date Return rawAvgRets avgRets