在data.table中有效地插入缺省缺失行

时间:2013-05-13 09:14:53

标签: r data.table

假设我有以下data.table

dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
           wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
           val=c(2,3,5,8,6,2,3,4,2,6))

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  2  mon   3
 8:  2  tue   4
 9:  2  thu   2
10:  2  fri   6

这是另一个data.table聚合的结果。它表示变量的计数(val),具体取决于不同个体(wday)的工作日(id)。问题是,在我的操作过程中,我已经失去了计数为0的工作日。

所以问题是:如何通过为每个ID插入与data.table缺少工作日的行数来有效地更新我的val=0对象?

结果如下:

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  1  sun   0
 8:  2  mon   3
 9:  2  tue   4
10:  2  wed   0
11:  2  thu   2
12:  2  fri   6
13:  2  sat   0
14:  2  sun   0

非常感谢你的帮助。

2 个答案:

答案 0 :(得分:2)

我现在能够想到的一个简单方法是使用expand.grid来获取所有组合,然后将其用于allow.cartesian = TRUE的子集:

setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]

#     id wday val
#  1:  1  mon   2
#  2:  1  tue   3
#  3:  1  wed   5
#  4:  1  thu   8
#  5:  1  fri   6
#  6:  1  sat   2
#  7:  1  sun  NA
#  8:  2  mon   3
#  9:  2  tue   4
# 10:  2  wed  NA
# 11:  2  thu   2
# 12:  2  fri   6
# 13:  2  sat  NA
# 14:  2  sun  NA

或者,可以使用idx直接构建CJ数据表:

dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]

答案 1 :(得分:1)

匹配和ddply的另一个可能性:

FUN <- function(x) {
y <- x$val[match(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), x$wday, nomatch=NA)]
y[is.na(y)] <- 0
y <- data.frame(wday=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), val=y)
y
}
ddply(dt, .(id), FUN)