r:仅当两个单独的数据帧具有相同的内容时,才如何合并它们

时间:2020-10-08 08:46:14

标签: r dataframe

考虑这个示例(我的实际数据集更大),其中有两个数据集:一个数据集中在人群中,另一个数据集中在房屋中。

数据集1:

Group   Person 
Group1  Andy   
Group2  Andy
Group2  Richard
Group3  Richard
Group4  Andy
Group4  Richard
Group4  Meg

数据集2:

House  Person
HouseA Andy
HouseA Richard
HouseB Andy
HouseB Richard
HouseB Meg

从此示例中,可以看到第2组和房屋A都包含Andy和Richard。第4组和B组都包含Andy,Richard和Meg。我想要的输出是:

Group  House  Person
Group2 HouseA Andy
Group2 HouseA Richard
Group4 HouseB Andy
Group4 HouseB Richard
Group4 HouseB Meg

可复制的数据:

df1 <- structure(list(Group = c("Group1", "Group2", "Group2", "Group3", 
"Group4", "Group4", "Group4"), Names = c("Andy", "Andy", "Richard", 
"Richard", "Andy", "Richard", "Meg")), class = "data.frame", row.names = c(NA, 
-7L))

df2 <- structure(list(House = c("HouseA", "HouseA", "HouseB", "HouseB", 
"HouseB"), Names = c("Andy", "Richard", "Andy", "Richard", "Meg"
)), class = "data.frame", row.names = c(NA, -5L))

4 个答案:

答案 0 :(得分:1)

使用data.table + digest的替代方法。希望它是可读的:

library(digest)
library(data.table)
setDT(df1)
setDT(df2)

out <- merge(
  df1[, .(People = list(sort(Names)), hash = digest(sort(Names))), by = Group],
  df2[, .(hash = digest(sort(Names))), by = House],
  by = "hash")

out[, .(Person = unlist(People)), by = .(Group, House)]

哪个会产生:

   Group  House  Person
1: Group2 HouseA    Andy
2: Group2 HouseA Richard
3: Group4 HouseB    Andy
4: Group4 HouseB     Meg
5: Group4 HouseB Richard

答案 1 :(得分:1)

使用dplyr

的解决方案
library(dplyr)
merge (
  df1 %>% group_by(Group) %>% mutate(nGroup = n()),
  df2 %>% group_by(House) %>% mutate(nHouse = n())) %>% 
  filter(nGroup == nHouse) %>% 
  arrange(Group, House) %>% 
  select(Group, House, Names)

##    Group  House   Names
##1 Group2 HouseA    Andy
##2 Group2 HouseA Richard
##3 Group4 HouseB    Andy
##4 Group4 HouseB     Meg
##5 Group4 HouseB Richard

答案 2 :(得分:0)

这是一次基本的R尝试:

#split df2 on house value
tmp <- split(df2, df2$House)  
#split df1 on Group value 
result <- do.call(rbind, by(df1, df1$Group, function(x) {
  #Check which house and group combination has exact same names
  val <- sapply(tmp, function(y) all(y$Names %in% x$Names) & 
                                 all(x$Names %in% y$Names))
  if(any(val))
    #attach group name and combine the result
    do.call(rbind, Map(cbind, tmp[val], Group = x$Group[1]))
}))
#Remove rownames
rownames(result)  <- NULL
result  

#   House   Names  Group
#1 HouseA    Andy Group2
#2 HouseA Richard Group2
#3 HouseB    Andy Group4
#4 HouseB Richard Group4
#5 HouseB     Meg Group4

答案 3 :(得分:0)

使用tidyr::unnest + subset + aggregate的一个选项

tidyr::unnest(
  subset(
    aggregate(Names ~ ., df1, function(x) sort(unique(x))),
    Names %in% aggregate(Names ~ ., df2, function(x) sort(unique(x)))$Names
  ),
  cols = "Names"
)

给出

  Group  Names
  <chr>  <chr>
1 Group2 Andy
2 Group2 Richard
3 Group4 Andy
4 Group4 Meg
5 Group4 Richard