我有两个分别要比较的两列数据框,并生成仅出现在第一个数据框内的输出,这是两个数据框之间进行比较时两列交互作用的差异。
我尝试使用merge,%in%,Interaction,match,但似乎无法获得正确的输出。我也对SO进行了广泛搜索,但没有发现类似的问题。
我找到的最接近的响应是:
newdat <- match(interaction(dfA$colA, dfA$colB), interaction(dfB$colA, dfB$colB))
但是显然,这段代码是不正确的,因为这样做(如果可行)会给我一些数据帧之间的共同点,我希望它们之间有区别(错误-当colA和B是字符串)。
示例数据:
#Dataframe A
colA colB
Aspirin Smith, John
Aspirin Doe, Jane
Atorva Smith, John
Simva Doe, Jane
#Dataframe B
colA colB
Aspirin Smith, John
Aspirin Doe, Jane
Atorva Doe, Jane
## GOAL:
#Dataframe
colA colB
Atorva Smith, John
Simva Doe, Jane
谢谢!
答案 0 :(得分:2)
我们可以使用setdiff
包中的dplyr
。
library(dplyr)
setdiff(datA, datB)
# colA colB
# 1 Atorva Smith, John
# 2 Simva Doe, Jane
数据
datA <- read.table(text = " colA colB
Aspirin 'Smith, John'
Aspirin 'Doe, Jane'
Atorva 'Smith, John'
Simva 'Doe, Jane'",
header = TRUE, stringsAsFactors = FALSE)
datB <- read.table(text = " colA colB
Aspirin 'Smith, John'
Aspirin 'Doe, Jane'
Atorva 'Doe, Jane'",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
如果您需要基本的R解决方案,可以很容易地编写一个setdiffDF
函数。
setdiffDF <- function(x, y){
ix <- !duplicated(rbind(y, x))[nrow(y) + 1:nrow(x)]
x[ix, ]
}
setdiffDF(dfA, dfB)
# colA colB
#3 Atorva Smith, John
#4 Simva Doe, Jane
dput
格式的数据。
dfA <-
structure(list(colA = structure(c(1L, 1L, 2L, 3L),
.Label = c("Aspirin", "Atorva", "Simva"), class = "factor"),
colB = structure(c(2L, 1L, 2L, 1L), .Label = c("Doe, Jane",
"Smith, John"), class = "factor")), class = "data.frame",
row.names = c(NA, -4L))
dfB <-
structure(list(colA = structure(c(1L, 1L, 2L),
.Label = c("Aspirin", "Atorva"), class = "factor"),
colB = structure(c(2L, 1L, 1L), .Label = c("Doe, Jane",
"Smith, John"), class = "factor")), class = "data.frame",
row.names = c(NA, -3L))