根据两列中的值从两个现有数据框中创建一个新数据框

时间:2020-07-23 22:59:21

标签: r dataframe dplyr data-cleaning

输入数据框

DF 1(例如-nrow = 10)

Col A | Col B | Col C
  a       1       2    
  a       3       4    
  b       5       6    
  c       9      10    

DF 2(例如-nrow = 20)

Col A | Col B | Col E
  a       1       22    
  a       31      41    
  a       3       63    
  b       5       6
  b       11      13   
  c       9       20 

我想创建第三个数据集,其中包含在数据框2中为Col A和Col B条目找到的每个附加行。

输出文件(nrow = 20-10 = 10)

Col A | Col B | Col E
  a       31      41    
  b       11      13 

2 个答案:

答案 0 :(得分:5)

library(dplyr)
anti_join(df2, df1, by = c("ColA", "ColB"))
#   ColA ColB ColE
# 1    a   31   41
# 2    b   11   13

数据:

df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L, 
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA, 
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L, 
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA, 
-6L))

答案 1 :(得分:1)

我们可以使用

library(data.table)
setDT(df2)[!df1, on = .(ColA, ColB)]
#  ColA ColB ColE
#1:    a   31   41
#2:    b   11   13

数据

df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L, 
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA, 
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L, 
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA, 
-6L))