合并两个data.tables,其中dt2中的所有行与dt1中的每一行组合

时间:2017-02-11 17:58:03

标签: r data.table

我的数据与此类似,只是dt1 2900万行,dt2 只有15行(不是1500万)。< / p>

dt1 <- data.table(ID=1:4,City=c("Charlotte","DC","Salem","Boston"))
dt2 <- data.table(Birds=c("Saker","Peregrine","Barbary","Prarie","Golden","Coopers","Canary","Finch"),BirdType=c("Falcon","Falcon","Falcon","Falcon","Eagle","Hawk","Breakfast","Breakfast"))

这样的输出:

> dt1
   ID      City
1:  1 Charlotte
2:  2        DC
3:  3     Salem
4:  4    Boston

> dt2
       Birds  BirdType
1:     Saker    Falcon
2: Peregrine    Falcon
3:   Barbary    Falcon
4:    Prarie    Falcon
5:    Golden     Eagle
6:   Coopers      Hawk
7:    Canary Breakfast
8:     Finch Breakfast

我想合并两个data.tables,其中每行dt1与dt2的所有行组合,最终给出一个带有32行的data.table,输出如下:

> dtMerged
   ID      City  Birds     BirdType
1:  1  Charlotte Saker      Falcon
2:  1  Charlotte Peregrine  Falcon
3:  1  Charlotte Barbary    Falcon
4:  1  Charlotte Prarie     Falcon
5:  1  Charlotte Golden     Eagle   
6:  1  Charlotte Coopers    Hawk
7:  1  Charlotte Canary   Breakfast
8:  1  Charlotte Finch    Breakfast
9:   2        DC Saker      Falcon
10:  2        DC Peregrine  Falcon
11:  2        DC Barbary    Falcon
12:  2        DC Prarie     Falcon
13:  2        DC Golden     Eagle   
14:  2        DC Coopers    Hawk
15:  2        DC Canary   Breakfast
16:  2        DC Finch    Breakfast
17:  3     Salem Saker      Falcon
18:  3     Salem Saker      Falcon
etc...

如何最好地实现这一点的任何想法将不胜感激。 我在Windows 7 PC上使用data.table版本1.10.4。感谢。

1 个答案:

答案 0 :(得分:1)

正如@akrun评论的那样,交叉连接似乎是解决问题的方法之一。为了实现它,我在this Stack Overflow post中引用了@jangorecki CJ.dt的一个非常简洁的函数来获得所需的解决方案:

CJ.dt = function(X,Y) {
  stopifnot(is.data.table(X),is.data.table(Y))
  k = NULL
  X = X[, c(k=1, .SD)]
  setkey(X, k)
  Y = Y[, c(k=1, .SD)]
  setkey(Y, NULL)
  X[Y, allow.cartesian=TRUE][, k := NULL][]
}

new_df <- CJ.dt(dt1, dt2)
setorder(new_df, ID)

以下是重新订购后的完整输出:

> new_df

 ID      City     Birds  BirdType
 1:  1 Charlotte     Saker    Falcon
 2:  1 Charlotte Peregrine    Falcon
 3:  1 Charlotte   Barbary    Falcon
 4:  1 Charlotte    Prarie    Falcon
 5:  1 Charlotte    Golden     Eagle
 6:  1 Charlotte   Coopers      Hawk
 7:  1 Charlotte    Canary Breakfast
 8:  1 Charlotte     Finch Breakfast
 9:  2        DC     Saker    Falcon
10:  2        DC Peregrine    Falcon
11:  2        DC   Barbary    Falcon
12:  2        DC    Prarie    Falcon
13:  2        DC    Golden     Eagle
14:  2        DC   Coopers      Hawk
15:  2        DC    Canary Breakfast
16:  2        DC     Finch Breakfast
17:  3     Salem     Saker    Falcon
18:  3     Salem Peregrine    Falcon
19:  3     Salem   Barbary    Falcon
20:  3     Salem    Prarie    Falcon
21:  3     Salem    Golden     Eagle
22:  3     Salem   Coopers      Hawk
23:  3     Salem    Canary Breakfast
24:  3     Salem     Finch Breakfast
25:  4    Boston     Saker    Falcon
26:  4    Boston Peregrine    Falcon
27:  4    Boston   Barbary    Falcon
28:  4    Boston    Prarie    Falcon
29:  4    Boston    Golden     Eagle
30:  4    Boston   Coopers      Hawk
31:  4    Boston    Canary Breakfast
32:  4    Boston     Finch Breakfast