考虑这个示例(我的实际数据集更大),其中有两个数据集:一个数据集中在人群中,另一个数据集中在房屋中。
数据集1:
Group Person
Group1 Andy
Group2 Andy
Group2 Richard
Group3 Richard
Group4 Andy
Group4 Richard
Group4 Meg
数据集2:
House Person
HouseA Andy
HouseA Richard
HouseB Andy
HouseB Richard
HouseB Meg
从此示例中,可以看到第2组和房屋A都包含Andy和Richard。第4组和B组都包含Andy,Richard和Meg。我想要的输出是:
Group House Person
Group2 HouseA Andy
Group2 HouseA Richard
Group4 HouseB Andy
Group4 HouseB Richard
Group4 HouseB Meg
可复制的数据:
df1 <- structure(list(Group = c("Group1", "Group2", "Group2", "Group3",
"Group4", "Group4", "Group4"), Names = c("Andy", "Andy", "Richard",
"Richard", "Andy", "Richard", "Meg")), class = "data.frame", row.names = c(NA,
-7L))
df2 <- structure(list(House = c("HouseA", "HouseA", "HouseB", "HouseB",
"HouseB"), Names = c("Andy", "Richard", "Andy", "Richard", "Meg"
)), class = "data.frame", row.names = c(NA, -5L))
答案 0 :(得分:1)
使用data.table
+ digest
的替代方法。希望它是可读的:
library(digest)
library(data.table)
setDT(df1)
setDT(df2)
out <- merge(
df1[, .(People = list(sort(Names)), hash = digest(sort(Names))), by = Group],
df2[, .(hash = digest(sort(Names))), by = House],
by = "hash")
out[, .(Person = unlist(People)), by = .(Group, House)]
哪个会产生:
Group House Person
1: Group2 HouseA Andy
2: Group2 HouseA Richard
3: Group4 HouseB Andy
4: Group4 HouseB Meg
5: Group4 HouseB Richard
答案 1 :(得分:1)
使用dplyr
library(dplyr)
merge (
df1 %>% group_by(Group) %>% mutate(nGroup = n()),
df2 %>% group_by(House) %>% mutate(nHouse = n())) %>%
filter(nGroup == nHouse) %>%
arrange(Group, House) %>%
select(Group, House, Names)
## Group House Names
##1 Group2 HouseA Andy
##2 Group2 HouseA Richard
##3 Group4 HouseB Andy
##4 Group4 HouseB Meg
##5 Group4 HouseB Richard
答案 2 :(得分:0)
这是一次基本的R尝试:
#split df2 on house value
tmp <- split(df2, df2$House)
#split df1 on Group value
result <- do.call(rbind, by(df1, df1$Group, function(x) {
#Check which house and group combination has exact same names
val <- sapply(tmp, function(y) all(y$Names %in% x$Names) &
all(x$Names %in% y$Names))
if(any(val))
#attach group name and combine the result
do.call(rbind, Map(cbind, tmp[val], Group = x$Group[1]))
}))
#Remove rownames
rownames(result) <- NULL
result
# House Names Group
#1 HouseA Andy Group2
#2 HouseA Richard Group2
#3 HouseB Andy Group4
#4 HouseB Richard Group4
#5 HouseB Meg Group4
答案 3 :(得分:0)
使用tidyr::unnest
+ subset
+ aggregate
的一个选项
tidyr::unnest(
subset(
aggregate(Names ~ ., df1, function(x) sort(unique(x))),
Names %in% aggregate(Names ~ ., df2, function(x) sort(unique(x)))$Names
),
cols = "Names"
)
给出
Group Names
<chr> <chr>
1 Group2 Andy
2 Group2 Richard
3 Group4 Andy
4 Group4 Meg
5 Group4 Richard