按列匹配两个数据帧并找到每个可能的匹配组合

时间:2018-03-16 14:39:51

标签: r join match

考虑我有两个data.frames:

A<-data.frame(a=c("b","a", "a", "e", "e","a"),Za=c(11,22,33,44,55,66))
B<-data.frame(b=c("a","a", "b", "e", "f","f"),Zb=c(11,22,33,44,55,66))

现在我想根据列a和b匹配它们,但是要维持每个可能的组合。所以最后我想要:

Anew<-data.frame(a=c("a","a","a","a","a","a","b","e","e","f","f"),Za=c(11,11,11,22,22,22,33,44,44,55,66))

Bnew<-data.frame(b=c("a","a","a","a","a","a","b","e","e",NA,NA),Zb=c(22,33,66,22,33,66,11,44,55,NA,NA))


Anew
   a Za
1  a 11
2  a 11
3  a 11
4  a 22
5  a 22
6  a 22
7  b 33
8  e 44
9  e 44
10 f 55
11 f 66

Bnew
      b Zb
1     a 22
2     a 33
3     a 66
4     a 22
5     a 33
6     a 66
7     b 11
8     e 44
9     e 55
10 <NA> NA
11 <NA> NA

如果可能的话,我不想使用ncomb,因为我的矢量真的非常庞大,这会扼杀我的记忆。快速运行的解决方案将是完美的!

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

如果您正在使用大型数据集,请不要使用data.frame,而是使用data.table。这是一个解决方案:

A<-data.table(a=c("b","a", "a", "e", "e","a"),Za=c(11,22,33,44,55,66))
B<-data.table(b=c("a","a", "b", "e", "f","f"),Zb=c(11,22,33,44,55,66))

df <- merge(A, B, by.x="a",by.y="b", all = TRUE)

df[,Match := ifelse(!is.na(Za),1,0)]

    a Za Zb Match
 1: a 22 11     1
 2: a 22 22     1
 3: a 33 11     1
 4: a 33 22     1
 5: a 66 11     1
 6: a 66 22     1
 7: b 11 33     1
 8: e 44 44     1
 9: e 55 44     1
10: f NA 55     0
11: f NA 66     0