假设我有以下data.table
:
dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
val=c(2,3,5,8,6,2,3,4,2,6))
id wday val
1: 1 mon 2
2: 1 tue 3
3: 1 wed 5
4: 1 thu 8
5: 1 fri 6
6: 1 sat 2
7: 2 mon 3
8: 2 tue 4
9: 2 thu 2
10: 2 fri 6
这是另一个data.table
聚合的结果。它表示变量的计数(val
),具体取决于不同个体(wday
)的工作日(id
)。问题是,在我的操作过程中,我已经失去了计数为0的工作日。
所以问题是:如何通过为每个ID插入与data.table
缺少工作日的行数来有效地更新我的val=0
对象?
结果如下:
id wday val
1: 1 mon 2
2: 1 tue 3
3: 1 wed 5
4: 1 thu 8
5: 1 fri 6
6: 1 sat 2
7: 1 sun 0
8: 2 mon 3
9: 2 tue 4
10: 2 wed 0
11: 2 thu 2
12: 2 fri 6
13: 2 sat 0
14: 2 sun 0
非常感谢你的帮助。
答案 0 :(得分:2)
我现在能够想到的一个简单方法是使用expand.grid
来获取所有组合,然后将其用于allow.cartesian = TRUE
的子集:
setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]
# id wday val
# 1: 1 mon 2
# 2: 1 tue 3
# 3: 1 wed 5
# 4: 1 thu 8
# 5: 1 fri 6
# 6: 1 sat 2
# 7: 1 sun NA
# 8: 2 mon 3
# 9: 2 tue 4
# 10: 2 wed NA
# 11: 2 thu 2
# 12: 2 fri 6
# 13: 2 sat NA
# 14: 2 sun NA
或者,可以使用idx
直接构建CJ
数据表:
dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]
答案 1 :(得分:1)
匹配和ddply的另一个可能性:
FUN <- function(x) {
y <- x$val[match(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), x$wday, nomatch=NA)]
y[is.na(y)] <- 0
y <- data.frame(wday=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), val=y)
y
}
ddply(dt, .(id), FUN)