重复滚动连接而不循环

时间:2016-01-18 22:38:34

标签: r join data.table

在2015年每天之前找到最新“价值”的最有效方法是什么,按(loc.x,loc.y)对分组?

dt <- data.table( 
  loc.x = as.integer(c(1, 1, 3, 1, 3, 1)),
  loc.y = as.integer(c(1, 2, 1, 2, 1, 2)),
  time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27",
                    "2015-12-31", "2014-09-13", "2015-08-19")), 
  value = letters[1:6]
)
setkey(dt, loc.x, loc.y, time)

必需的输出:

   loc.x loc.y 2015-01-01  ...  2015-12-31
1:     1     1         NA                a
2:     1     2         NA                f
3:     3     1          e                c

1 个答案:

答案 0 :(得分:2)

您可以使用loc.x创建包含2015年所有日期的查找表和loc.yCJ中的唯一值,然后结合dcast运行滚动连接。

Lookup <- do.call(CJ, c(unique = TRUE,
                        as.list(dt[, .(loc.x, loc.y)]),
                        list(time = seq(as.IDate("2015-01-01"), 
                                        as.IDate("2015-12-31"), 
                                         by = "day"))))


dcast(dt[Lookup, roll = TRUE, nomatch = 0L], loc.x + loc.y ~ time, value.var = "value")

#    loc.x loc.y 2015-01-01 2015-01-02 2015-01-03 
# 1:     1     1         NA         NA         NA
# 2:     1     2         NA         NA         NA 
# 3:     3     1          e          e          e ... (truncated)

#    2015-12-26 2015-12-27 2015-12-28 2015-12-29 2015-12-30 2015-12-31
# 1:          a          a          a          a          a          a
# 2:          f          f          f          f          f          d
# 3:          c          c          c          c          c          c