我有一个数据表,带有分组列,并希望填充行,以便每个组的行数与人口最多的行相同。考虑这个数据,其中列a具有变量,列b是组;
## dummy data
require(data.table)
dt <- data.table(a=c(1,3,5,1,2,3,4,5,1,5),
b=c("gpA","gpA","gpA","gpB","gpB","gpB","gpB","gpB","gpC","gpC"),
c=c(2346,2314,345,234,578,36,234,547,3425,234))
a b c
1: 1 gpA 2346
2: 3 gpA 2314
3: 5 gpA 345
4: 1 gpB 234
5: 2 gpB 578
6: 3 gpB 36
7: 4 gpB 234
8: 5 gpB 547
9: 1 gpC 3425
10: 5 gpC 234
B组的行数最多,列'a'贯穿1到5.我想'填充'其余的组,使其具有相同的列'a'值,但列'c'可以是NA或者0。
我已使用CJ
,following this great answer完成此操作,但在初始执行后,将删除列c中包含NA的行。这是为什么?
# setkey and pad out rows with CJ
setkey(dt, b, a)
> dt[CJ(b,a, unique=T)]
a b c
1: 1 gpA 2346
2: 2 gpA NA
3: 3 gpA 2314
4: 4 gpA NA
5: 5 gpA 345
6: 1 gpB 234
7: 2 gpB 578
8: 3 gpB 36
9: 4 gpB 234
10: 5 gpB 547
11: 1 gpC 3425
12: 2 gpC NA
13: 3 gpC NA
14: 4 gpC NA
15: 5 gpC 234
> dt
a b c
1: 1 gpA 2346
2: 3 gpA 2314
3: 5 gpA 345
4: 1 gpB 234
5: 2 gpB 578
6: 3 gpB 36
7: 4 gpB 234
8: 5 gpB 547
9: 1 gpC 3425
10: 5 gpC 234
>
答案 0 :(得分:1)
您必须将函数重新分配给原始变量以保存结果。
dt <- dt[CJ(b,a, unique=T)]