Question

我有这样的df1：

text 1
text 2
text 3
text 4
text 5

另外一个像这样的df2：

text 1
text 2
text 3
text 5

问题是我的dfs几乎一样，它们有很多行，我可以找到第一个df附加的那个，以便知道它是谁？

是否有任何可能的选项来比较两个dfs并找到它们之间的区别？

Answer 1

您可以rbind这两个，然后找到非重复的行。

例如，如果您有数据框a和b，那么

x <- rbind(a, b)
x[!duplicated(x) & !duplicated(x, fromLast = TRUE), ]
#     V1 V2
# 4 text  4

或者，如果您愿意，可以使用dplyr::setdiff()，它具有数据框方法。

dplyr::setdiff(a, b)
#     V1 V2
# 1 text  4

，其中

a <- read.table(text = "text 1
text 2
text 3
text 4
text 5", header = FALSE)

b <- a[-4, ]

Answer 2

这是一个data.table解决方案：

library(data.table)
df1 <- data.frame(V1=rep('text',5),V2=1:5)
df2 <- data.frame(V1=rep('text',4),V2=c(1:3,5))
setkey(setDT(df1))[!df2]
##      V1 V2
## 1: text  4

Answer 3

基础R解决方案。

 df1[-merge(df1, df2)[,2], ]
    V1 V2
4 text  4

或：

 df1[-which(df1[ , 2] %in% df2[, 2]), ]
    V1 V2
4 text  4

修改

在考虑了基础R解决方案之后，我意识到我以前的解决方案可能会因某些数据而变弱。我认为这是一个更强大的解决方案。

 df1[ !df1$V2 %in% merge(df1, df2)[,2, drop = T], ]
    V1 V2
4 text  4

Answer 4

基础套餐：

  V2 V1.x V1.y
4  4 text <NA>

输出：

sqldf

使用library(sqldf) sqldf("SELECT * FROM df1 LEFT JOIN df2 USING (V2) WHERE df2.V1 IS NULL")：

    V1 V2   V1
1 text  4 <NA>

输出：

{{1}}

匹配2个几乎相同的数据帧是相同的

4 个答案:

修改