Question

我有两个分别要比较的两列数据框，并生成仅出现在第一个数据框内的输出，这是两个数据框之间进行比较时两列交互作用的差异。

我尝试使用merge，％in％，Interaction，match，但似乎无法获得正确的输出。我也对SO进行了广泛搜索，但没有发现类似的问题。

我找到的最接近的响应是：

newdat <- match(interaction(dfA$colA, dfA$colB), interaction(dfB$colA, dfB$colB))

但是显然，这段代码是不正确的，因为这样做（如果可行）会给我一些数据帧之间的共同点，我希望它们之间有区别（错误-当colA和B是字符串）。

示例数据：

#Dataframe A

    colA     colB
    Aspirin  Smith, John
    Aspirin  Doe, Jane
    Atorva   Smith, John
    Simva    Doe, Jane

#Dataframe B
    colA     colB
    Aspirin  Smith, John
    Aspirin  Doe, Jane
    Atorva   Doe, Jane

## GOAL: 

#Dataframe
    colA     colB
    Atorva   Smith, John
    Simva    Doe, Jane

谢谢！

Answer 1

我们可以使用setdiff包中的dplyr。

library(dplyr)

setdiff(datA, datB)
#     colA        colB
# 1 Atorva Smith, John
# 2  Simva   Doe, Jane

数据

datA <- read.table(text = "    colA     colB
    Aspirin  'Smith, John'
    Aspirin  'Doe, Jane'
    Atorva   'Smith, John'
    Simva    'Doe, Jane'",
                   header = TRUE, stringsAsFactors = FALSE)

datB <- read.table(text = "    colA     colB
    Aspirin  'Smith, John'
    Aspirin  'Doe, Jane'
    Atorva   'Doe, Jane'",
                   header = TRUE, stringsAsFactors = FALSE)

Answer 2

如果您需要基本的R解决方案，可以很容易地编写一个setdiffDF函数。

setdiffDF <- function(x, y){
  ix <- !duplicated(rbind(y, x))[nrow(y) + 1:nrow(x)]
  x[ix, ]
}


setdiffDF(dfA, dfB)
#    colA        colB
#3 Atorva Smith, John
#4  Simva   Doe, Jane

dput格式的数据。

dfA <-
structure(list(colA = structure(c(1L, 1L, 2L, 3L), 
.Label = c("Aspirin", "Atorva", "Simva"), class = "factor"), 
colB = structure(c(2L, 1L, 2L, 1L), .Label = c("Doe, Jane", 
"Smith, John"), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L))

dfB <-
structure(list(colA = structure(c(1L, 1L, 2L), 
.Label = c("Aspirin", "Atorva"), class = "factor"), 
colB = structure(c(2L, 1L, 1L), .Label = c("Doe, Jane", 
"Smith, John"), class = "factor")), class = "data.frame", 
row.names = c(NA, -3L))

根据两个向量之间的相互作用找到差异

2 个答案: