根据两个向量之间的相互作用找到差异

时间:2019-04-09 17:38:03

标签: r dataframe

我有两个分别要比较的两列数据框,并生成仅出现在第一个数据框内的输出,这是两个数据框之间进行比较时两列交互作用的差异。

我尝试使用merge,%in%,Interaction,match,但似乎无法获得正确的输出。我也对SO进行了广泛搜索,但没有发现类似的问题。

我找到的最接近的响应是:

newdat <- match(interaction(dfA$colA, dfA$colB), interaction(dfB$colA, dfB$colB))

但是显然,这段代码是不正确的,因为这样做(如果可行)会给我一些数据帧之间的共同点,我希望它们之间有区别(错误-当colA和B是字符串)。

示例数据:

#Dataframe A

    colA     colB
    Aspirin  Smith, John
    Aspirin  Doe, Jane
    Atorva   Smith, John
    Simva    Doe, Jane

#Dataframe B
    colA     colB
    Aspirin  Smith, John
    Aspirin  Doe, Jane
    Atorva   Doe, Jane

## GOAL: 

#Dataframe
    colA     colB
    Atorva   Smith, John
    Simva    Doe, Jane

谢谢!

2 个答案:

答案 0 :(得分:2)

我们可以使用setdiff包中的dplyr

library(dplyr)

setdiff(datA, datB)
#     colA        colB
# 1 Atorva Smith, John
# 2  Simva   Doe, Jane

数据

datA <- read.table(text = "    colA     colB
    Aspirin  'Smith, John'
    Aspirin  'Doe, Jane'
    Atorva   'Smith, John'
    Simva    'Doe, Jane'",
                   header = TRUE, stringsAsFactors = FALSE)

datB <- read.table(text = "    colA     colB
    Aspirin  'Smith, John'
    Aspirin  'Doe, Jane'
    Atorva   'Doe, Jane'",
                   header = TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:1)

如果您需要基本的R解决方案,可以很容易地编写一个setdiffDF函数。

setdiffDF <- function(x, y){
  ix <- !duplicated(rbind(y, x))[nrow(y) + 1:nrow(x)]
  x[ix, ]
}


setdiffDF(dfA, dfB)
#    colA        colB
#3 Atorva Smith, John
#4  Simva   Doe, Jane

dput格式的数据。

dfA <-
structure(list(colA = structure(c(1L, 1L, 2L, 3L), 
.Label = c("Aspirin", "Atorva", "Simva"), class = "factor"), 
colB = structure(c(2L, 1L, 2L, 1L), .Label = c("Doe, Jane", 
"Smith, John"), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L))

dfB <-
structure(list(colA = structure(c(1L, 1L, 2L), 
.Label = c("Aspirin", "Atorva"), class = "factor"), 
colB = structure(c(2L, 1L, 1L), .Label = c("Doe, Jane", 
"Smith, John"), class = "factor")), class = "data.frame", 
row.names = c(NA, -3L))