使用plyr(或* apply)来计算累积rtns

时间:2013-10-01 00:01:41

标签: r plyr

我已经在这个问题上挣扎了好几个小时,这对于plyr或* apply来说似乎是正确的。有人能指出一个不太笨重的R解决方案而不是我在下面列出的解决方案吗?


问题的关键在于我想使用plyr“循环”一个日期子集中的证券列表。一些证券在日期范围内消失。 (我使用数据中的前向rt,没有生存偏差。)我希望每个日期范围的输出是所选证券累积回报的数据框。我可以使用它(连同初始权重)与其他日期范围结合来计算各种投资组合指标。


d                t    r
1 2013-03-31   ibm 0.01
2 2013-03-31  appl 0.02
3 2013-03-31 loser 0.03
4 2013-04-30   ibm 0.04
5 2013-04-30  appl 0.05
6 2013-04-30 loser 0.06
7 2013-05-31   ibm 0.07
8 2013-05-31  appl 0.08

请注意,日期范围的最后一个月不存在安全“输家”。 (证券不会再出现。)这里有一些代码可以创建玩具数据框架和似乎有用的笨重的解决方案。

#Create data frame for the example code
dt <- as.Date("20130331","%Y%m%d")
mydf <- data.frame(d=dt,t="ibm",r=0.01)
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.02))
mydf <- rbind(mydf,data.frame(d=dt,t="loser",r=0.03))
dt <- as.Date("20130430","%Y%m%d")
mydf <- rbind(mydf,data.frame(d=dt,t="ibm",r=0.04))
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.05))
mydf <- rbind(mydf,data.frame(d=dt,t="loser",r=0.06))
dt <- as.Date("20130531","%Y%m%d")
mydf <- rbind(mydf,data.frame(d=dt,t="ibm",r=0.07))
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.08))
#Note that there is no row for "loser" for 2013-05-31

#This plyr call crashes because "loser" doesn't have the same 
#   num of rtns as the others
#newdf <- ddply(mydf,.(t),function(x) cumprod(x[,"r"]+1)-1)

list_to_dataframe(res,attr(.data,“split_labels”))中的错误: 结果没有相同的长度

#I work with intermediate lists as a workaround
tmp.list <- dlply(mydf,.(t),function(x) cumprod(x[,"r"]+1)-1)

#Get the longest of any of the resulting lists (tmp = 3 in this example)
tmp <- max(as.numeric(lapply(tmp.list,length))) 

#Define function to extend cumulative rtn for missing values
#   By holding cumulative rtn constant, its as if
#   I hold cash when a security disappears
extendit <- function(x) if(length(x)<tmp){ 
} else {x}

#Use plyr to make all lists the same length

#Use plyr to create the data table I wanted
cusipcumrtns.df <- ldply(tmp2.list)          

#Must name key column since it got lost in the process
colnames(cusipcumrtns.df)[1] <- "t"


t         V1     V2       V3
1   ibm 0.01 0.0504 0.123928
2  appl 0.02 0.0710 0.156680
3 loser 0.03 0.0918 0.091800


1 个答案:

答案 0 :(得分:2)


keys.df <- expand.grid(d = unique(mydf$d),
                       t = unique(mydf$t))
full.df <- merge(keys.df, mydf, all.x = TRUE)


ddply(full.df, .(t), function(x) cumprod(ifelse(is.na(x$r), 0, x$r) + 1) - 1)
      t   V1     V2       V3
1   ibm 0.01 0.0504 0.123928
2  appl 0.02 0.0710 0.156680
3 loser 0.03 0.0918 0.091800


ddply(full.df,.(t), transform, cum.r = cumprod(ifelse(is.na(r), 0, r) + 1) - 1)