我有一个像
这样的大型数据框df <- data.frame(group= c("a","a","b","b","b","c"),
person = c("Tom","Jerry","Tom","Anna","Sam","Nic"), stringsAsFactors = FALSE)
df
group person
1 a Tom
2 a Jerry
3 b Tom
4 b Anna
5 b Sam
6 c Nic
并希望得到结果
df.output
pers1 pers2 person_in_common
1 Anna Jerry Tom
2 Jerry Sam Tom
3 Sam Tom Anna
4 Anna Tom Sam
6 Anna Sam Tom
结果数据框基本上给出了一个表格,其中包含所有具有另一个人共同点的人。我在SQL中找到了一种方法,但它需要很长时间,所以我想知道是否有一种有效的方法在R
答案 0 :(得分:2)
这是使用igraph
包的一个。基本思想是创建一个图形,然后为每个节点提取两个相邻的节点。
library(igraph)
X1 = split(df$person, df$group)
X2 = X1[lengths(X1) >= 2]
dat = data.frame(do.call(rbind, unlist(lapply(X2, function(x)
combn(x, 2, sort, FALSE)), recursive = FALSE)))
g = graph.data.frame(dat, directed = FALSE)
mydf = data.frame(as.matrix(get.adjacency(g)))
mydf = mydf[colSums(mydf) > 1]
ANS = sapply(mydf, function(x) t(combn(row.names(mydf)[which(x == 1)], 2)))
do.call(rbind, lapply(names(ANS), function(nm) data.frame(ANS[[nm]], nm)))
# X1 X2 nm
#1 Sam Tom Anna
#2 Anna Tom Sam
#3 Jerry Anna Tom
#4 Jerry Sam Tom
#5 Anna Sam Tom
OR
mynames = unique(do.call(c, X2))
do.call(rbind,
lapply(mynames, function(x){
L = V(g)$name[unlist(adjacent_vertices(graph = g, v = x))]
if(length(L) >= 2){
setNames(data.frame(t(combn(L, 2)), x), c("P1", "P2", "P3"))
}else{
setNames(data.frame(NA, NA, x), c("P1", "P2", "P3"))
}
}))
# P1 P2 P3
#1 Jerry Anna Tom
#2 Jerry Sam Tom
#3 Anna Sam Tom
#4 <NA> <NA> Jerry
#5 Sam Tom Anna
#6 Anna Tom Sam