使用R更好地在data.table中有条件地插入新数据条目(行)

时间:2016-07-31 15:12:38

标签: r performance insert data.table readability

以下数据表包含两个日期的5个投资组合(投资组合编号是非整数因子)的回报(RET)。

set.seed(123)
DT <- data.table(date = rep(as.Date(c("2005-05-02", "2005-05-03")), each = 5), portfolio = factor(rep(1:5, 2), levels = c(1:5, "diff", "avg")), RET = rnorm(n = 10))

           date portfolio         RET
  1: 2005-05-02         1 -0.56047565
  2: 2005-05-02         2 -0.23017749
  3: 2005-05-02         3  1.55870831
  4: 2005-05-02         4  0.07050839
  5: 2005-05-02         5  0.12928774
  6: 2005-05-03         1  1.71506499
  7: 2005-05-03         2  0.46091621
  8: 2005-05-03         3 -1.26506123
  9: 2005-05-03         4 -0.68685285
 10: 2005-05-03         5 -0.44566197

对于每个日期,我想在数据表中添加差异投资组合的回报,即第5个投资组合的回报与第1个投资组合的回报之间的差异,以及平均投资组合的回报,即五个投资组合的平均回报率。特别是,我想创建以下data.table

          date portfolio         RET
 1: 2005-05-02         1 -0.56047565
 2: 2005-05-02         2 -0.23017749
 3: 2005-05-02         3  1.55870831
 4: 2005-05-02         4  0.07050839
 5: 2005-05-02         5  0.12928774
 6: 2005-05-02       avg  0.19357026
 7: 2005-05-02      diff  0.68976338
 8: 2005-05-03         1  1.71506499
 9: 2005-05-03         2  0.46091621
10: 2005-05-03         3 -1.26506123
11: 2005-05-03         4 -0.68685285
12: 2005-05-03         5 -0.44566197
13: 2005-05-03       avg -0.04431897
14: 2005-05-03      diff -2.16072696

执行此操作的一种方法(based on this post)

DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "avg"), RET = replace(RET, is.na(portfolio), mean(RET[!is.na(RET)]) ) ), date]
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "diff"), RET = replace(RET, is.na(portfolio), RET[portfolio == "5"] - RET[portfolio == "1"]) ), date]

另一种方法是为差异和平均投资组合创建新的数据表,然后将它们全部rbindlist。

DT = rbindlist(
  l = list(DT, 
           DT[, .(portfolio = "diff", RET = RET[portfolio == "5"] -   RET[portfolio == "1"]), by = date],
           DT[, .(portfolio = "avg", RET = mean(RET)), by = date]
))
DT[order(date, portfolio)]

有更好的方法吗?

0 个答案:

没有答案