问题:
与data.table
的{{1}}命令{strong>和组tidyr
的complete
等价是什么?
by
的{{1}}和on
之间是什么关系?
示例:
by
目标是获得以下内容:
data.table
大概是这样的:
dt=data.table(a = c(1,1,2,2,3,3,4,4) , b = c(4,5,6,7,8,9,10,11) , c = c("x","x","x","x","y","y","y","y"))
show(dt)
a b c
1: 1 4 x
2: 1 5 x
3: 2 6 x
4: 2 7 x
5: 3 8 y
6: 3 9 y
7: 4 10 y
8: 4 11 y
但是它不起作用,a b c
1 4 x
1 5 x
1 6 x
1 7 x
2 4 x
2 5 x
2 6 x
2 7 x
3 8 y
3 9 y
3 10 y
3 11 y
4 8 y
4 9 y
4 10 y
4 11 y
文档对此语法的这一方面也很薄。
解决方案不足:
以下SO帖子解决了类似的问题,但是在这种情况下没有提供足够的解决方案。
setDT(dt)[CJ(a=a,b=b,unique=TRUE), on=.(a,b) , by = .(c)]
)data.table
)by
命令)答案 0 :(得分:5)
尝试一下:
dt[, CJ(a = a, b = b, unique = TRUE), by = "c"]
给予:
c a b
1: x 1 4
2: x 1 5
3: x 1 6
4: x 1 7
5: x 2 4
6: x 2 5
7: x 2 6
8: x 2 7
9: y 3 8
10: y 3 9
11: y 3 10
12: y 3 11
13: y 4 8
14: y 4 9
15: y 4 10
16: y 4 11
答案 1 :(得分:5)
complete
保留其他不相关的列,因此我将添加一个...
library(data.table)
dt = data.table(
a = c(1,1,2,2,3,3,4,4) ,
b = c(4,5,6,7,8,9,10,11) ,
c = c("x","x","x","x","y","y","y","y"),
d = LETTERS[10 + 1:8])
a b c d
1: 1 4 x K
2: 1 5 x L
3: 2 6 x M
4: 2 7 x N
5: 3 8 y O
6: 3 9 y P
7: 4 10 y Q
8: 4 11 y R
要完成每个c的a x b组合,我将使用这些组合创建一个新表(与@ G.Grothendieck的回答中已经完全一样)并进行update-join以获取d和其他非组合列:
mDT = dt[, CJ(a = a, b = b, unique=TRUE), by=c]
cvars = copy(names(mDT))
ovars = setdiff(names(dt), cvars)
mDT[, (ovars) := dt[.SD, on=cvars, mget(sprintf("x.%s", ovars))]]
setcolorder(mDT, names(dt))
a b c d
1: 1 4 x K
2: 1 5 x L
3: 1 6 x <NA>
4: 1 7 x <NA>
5: 2 4 x <NA>
6: 2 5 x <NA>
7: 2 6 x M
8: 2 7 x N
9: 3 8 y O
10: 3 9 y P
11: 3 10 y <NA>
12: 3 11 y <NA>
13: 4 8 y <NA>
14: 4 9 y <NA>
15: 4 10 y Q
16: 4 11 y R
或者,您也可以进行内部(?)连接,尽管这样做效率不高,因为它会创建两个新表:
dt[mDT, on=cvars]
# or more concisely....
dt[dt[, CJ(a = a, b = b, unique=TRUE), by=c], on=.(a,b,c)]
或者,每个by=
组执行一次内部联接(来自@eddi):
dt[, .SD[CJ(a = a, b = b, unique = TRUE), on = .(a, b)], by = c]
用于在整卷中进行比较:
library(dplyr); library(tidyr)
data.frame(dt) %>% group_by(c) %>% complete(a, b)
# A tibble: 16 x 4
# Groups: c [2]
c a b d
<chr> <dbl> <dbl> <chr>
1 x 1 4 K
2 x 1 5 L
3 x 1 6 <NA>
4 x 1 7 <NA>
5 x 2 4 <NA>
6 x 2 5 <NA>
7 x 2 6 M
8 x 2 7 N
9 y 3 8 O
10 y 3 9 P
11 y 3 10 <NA>
12 y 3 11 <NA>
13 y 4 8 <NA>
14 y 4 9 <NA>
15 y 4 10 Q
16 y 4 11 R
答案 2 :(得分:1)
以下内容将返回所需的结果。
library(data.table)
dt=data.table(a = c(1,1,2,2,3,3,4,4) ,
b = c(4,5,6,7,8,9,10,11) ,
c = c("x","x","x","x","y","y","y","y"))
dttrue <- fread('a b c
1 4 x
1 5 x
1 6 x
1 7 x
2 4 x
2 5 x
2 6 x
2 7 x
3 8 y
3 9 y
3 10 y
3 11 y
4 8 y
4 9 y
4 10 y
4 11 y')
dt2 <- dt[,CJ(a=a,b=b,unique = TRUE),by = c]
all.equal(dt2[,.(a,b,c)], dttrue) #true