检查两个数据帧之间的项目对

时间:2017-01-22 10:10:16

标签: r dataframe group-by

我有2个以下结构的数据框(A和B):

答:

projectID    offerID
   20          12
   20          17 
   32          12
   32          25

B:

 projectID    offerID
   20          12
   20          17 
   32          12

并且我想检查A中但不在B中的对。所以在我的例子中,我想获得包含A中但不在B中的对的新df:

projectID    offerID
   32           25

我尝试了一些选择;例如:

APairs <- A %>% group_by(projectID, offerID)
BPairs <- B %>% group_by(projectID, offerID)

!(APairs %in% BPairs)

但是我得到了真/假结果,我无法理解/验证我的数据。

我们将非常感谢您的帮助!

3 个答案:

答案 0 :(得分:4)

base R:

#define the key columns in the case of different structure between A and B
cols<-c("projectID","offerID")
A[!do.call(paste,A[cols]) %in% do.call(paste,B[cols]),]
#  projectID offerID
#4        32      25

答案 1 :(得分:3)

library(data.table)
setkey(setDT(A))
setkey(setDT(B))
A[!B]                # A[B] is similar to merge() so perform the opposite using !
#   projectID offerID
#1:        32      25

#incase there are extra columns in any of the table, the specify the common columns in a vector
common.col <- c("projectID", "offerID")
setkeyv(setDT(A), cols = common.col)
setkeyv(setDT(B), cols = common.col)
A[!B]

答案 2 :(得分:2)

我们可以使用anti_join

中的dplyr
 library(dplyr)
 anti_join(A, B)
 #    projectID offerID
 #1        32      25

如果列数更多,请指定by选项

 anti_join(A, B, by = c("projectID", "offerID"))
 #    projectID offerID
 #1        32      25