以下数据表包含两个日期的5个投资组合(投资组合编号是非整数因子)的回报(RET)。
set.seed(123)
DT <- data.table(date = rep(as.Date(c("2005-05-02", "2005-05-03")), each = 5), portfolio = factor(rep(1:5, 2), levels = c(1:5, "diff", "avg")), RET = rnorm(n = 10))
date portfolio RET
1: 2005-05-02 1 -0.56047565
2: 2005-05-02 2 -0.23017749
3: 2005-05-02 3 1.55870831
4: 2005-05-02 4 0.07050839
5: 2005-05-02 5 0.12928774
6: 2005-05-03 1 1.71506499
7: 2005-05-03 2 0.46091621
8: 2005-05-03 3 -1.26506123
9: 2005-05-03 4 -0.68685285
10: 2005-05-03 5 -0.44566197
对于每个日期,我想在数据表中添加差异投资组合的回报,即第5个投资组合的回报与第1个投资组合的回报之间的差异,以及平均投资组合的回报,即五个投资组合的平均回报率。特别是,我想创建以下data.table
date portfolio RET
1: 2005-05-02 1 -0.56047565
2: 2005-05-02 2 -0.23017749
3: 2005-05-02 3 1.55870831
4: 2005-05-02 4 0.07050839
5: 2005-05-02 5 0.12928774
6: 2005-05-02 avg 0.19357026
7: 2005-05-02 diff 0.68976338
8: 2005-05-03 1 1.71506499
9: 2005-05-03 2 0.46091621
10: 2005-05-03 3 -1.26506123
11: 2005-05-03 4 -0.68685285
12: 2005-05-03 5 -0.44566197
13: 2005-05-03 avg -0.04431897
14: 2005-05-03 diff -2.16072696
执行此操作的一种方法(based on this post)是
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "avg"), RET = replace(RET, is.na(portfolio), mean(RET[!is.na(RET)]) ) ), date]
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "diff"), RET = replace(RET, is.na(portfolio), RET[portfolio == "5"] - RET[portfolio == "1"]) ), date]
另一种方法是为差异和平均投资组合创建新的数据表,然后将它们全部rbindlist。
DT = rbindlist(
l = list(DT,
DT[, .(portfolio = "diff", RET = RET[portfolio == "5"] - RET[portfolio == "1"]), by = date],
DT[, .(portfolio = "avg", RET = mean(RET)), by = date]
))
DT[order(date, portfolio)]
有更好的方法吗?