我有两个具有相同行大小和不同列号的数据框,列的名称也不同,但是其中一些内容可能相似。
即df1:
df1<- data.frame("a"=c("0","1","0","1","0","0","0"),
"b"=c("1","1","1","1","1","0","0"),
"c"=c("1","1","0","0","1","0","0"),
"d"=c("1","1","1","1","1","1","1"))
df2:
df2<- data.frame("e"=c("1","1","0","1","0","0","0"),
"f"=c("1","1","1","1","1","0","0"),
"g"=c("0","0","0","0","1","0","0"),
"h"=c("0","0","0","0","1","1","1"))
如果看到的话,df1的“ b”列和df2的“ f”列相等。因此,我想要的结果是一个新的数据框,如下所示:
df3 <- data.frame("a"=c("0","1","0","1","0","0","0"),
"c"=c("1","1","0","0","1","0","0"),
"d"=c("1","1","1","1","1","1","1"),
"e"=c("1","1","0","1","0","0","0"),
"g"=c("0","0","0","0","1","0","0"),
"h"=c("0","0","0","0","1","1","1"))
注意:列“ b”和“ f”(相似)不在新df3中。 我在网上看过,但是没有找到一个例子。我认为主要的复杂性是合并是按内容而不是列名进行的。
答案 0 :(得分:1)
我们可以使用sapply
来检查完全匹配的列。
mat <- sapply(df1, function(x) sapply(df2, function(y) all(x == y)))
mat
# a b c d
#e FALSE FALSE FALSE FALSE
#f FALSE TRUE FALSE FALSE
#g FALSE FALSE FALSE FALSE
#h FALSE FALSE FALSE FALSE
在这里我们可以看到b
中的df1
列和f
中的df2
列应被删除。我们可以通过:
m2 <- which(mat, arr.ind = TRUE)
cbind(df1[-m2[, 2]], df2[-m2[, 1]])
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
答案 1 :(得分:1)
这可以完成工作:
df3 <- cbind(df1,df2)
df3 <- t(t(df3)[!(duplicated(t(df3)) | duplicated(t(df3), fromLast = TRUE)),])
df3
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
这将为您提供matrix
,如果需要,您可以将结果另存为df
答案 2 :(得分:1)
我们可以使用outer
中的base R
mat <- outer(df1, df2, FUN = Vectorize(function(x, y) all(x == y)))
mat
# e f g h
#a FALSE FALSE FALSE FALSE
#b FALSE TRUE FALSE FALSE
#c FALSE FALSE FALSE FALSE
#d FALSE FALSE FALSE FALSE
现在,我们可以获取行/列的名称
m2 <- as.matrix(subset(as.data.frame.table(mat), Freq, select = -Freq))
现在,我们使用'm2'从'df1','df2'和cbind
删除列名
cbind(df1[setdiff(names(df1), m2[,1])], df2[setdiff(names(df2), m2[,2])])
# a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1
答案 3 :(得分:1)
这是一个更tidyverse
的解决方案。
library(dplyr)
library(tidyr)
# based on Ronak's sapply approach
matches <- as.data.frame(sapply(df1, function(x) sapply(df2, function(y) identical(x, y)))) %>%
rownames_to_column(var = "df2") %>%
pivot_longer(-df2, names_to = "df1") %>% # pivot longer
filter(value) # keep only the matches
# programmatically build list of names to remove
vars_remove <- c(matches$df1, matches$df2) # will remove var names that are matches
df1 %>% bind_cols(df2) %>%
select(-any_of(vars_remove))
a c d e g h
1 0 1 1 1 0 0
2 1 1 1 1 0 0
3 0 0 1 0 0 0
4 1 0 1 1 0 0
5 0 1 1 0 1 1
6 0 0 1 0 0 1
7 0 0 1 0 0 1