输入数据框
DF 1(例如-nrow = 10)
Col A | Col B | Col C
a 1 2
a 3 4
b 5 6
c 9 10
DF 2(例如-nrow = 20)
Col A | Col B | Col E
a 1 22
a 31 41
a 3 63
b 5 6
b 11 13
c 9 20
我想创建第三个数据集,其中包含在数据框2中为Col A和Col B条目找到的每个附加行。
输出文件(nrow = 20-10 = 10)
Col A | Col B | Col E
a 31 41
b 11 13
答案 0 :(得分:5)
library(dplyr)
anti_join(df2, df1, by = c("ColA", "ColB"))
# ColA ColB ColE
# 1 a 31 41
# 2 b 11 13
数据:
df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L,
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L,
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA,
-6L))
答案 1 :(得分:1)
我们可以使用
library(data.table)
setDT(df2)[!df1, on = .(ColA, ColB)]
# ColA ColB ColE
#1: a 31 41
#2: b 11 13
df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L,
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L,
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA,
-6L))