给出一个清单:
foo <- list(c("a", "b", "d"), c("c", "b"), c("c"),
c("b", "d"), c("e", "f"), c("e", "g"))
获取包含其内容的不相交集的列表的有效方法是什么?
我想在这里获得:
[[1]]
[1] "a" "b" "c" "d"
[[2]]
[1] "e" "f" "g"
我设法提出的解决方案似乎过于复杂和缓慢(我正在使用包含多达数百个元素的大型列表(4000多个元素))。
谢谢!
解决方案基准
谢谢大家的意见。 igraph方法非常好。我对提出的解决方案进行了一些基准测试,并使用igraph与@flodel建议是有效的。这里的示例(iGrp
)有3170个元素。
> microbenchmark(igraph_method(iGrp), igraph_method2(iGrp), iterative_method(iGrp), times=10L)
## Unit: milliseconds
## expr min lq median uq max neval
## igraph_method(iGrp) 6892.8534 7140.0287 7229.5569 7396.2458 8044.9796 10
## igraph_method2(iGrp) 381.4555 391.2097 442.3282 472.5641 537.4885 10
## iterative_method(iGrp) 7118.7857 7272.9568 7595.9700 7675.2888 8485.4388 10
#### functions used
igraph_method <- function(lst) {
edg <- do.call("rbind", lapply(lst, function(x) {
if (length(x) > 1) t(combn(x, 2)) else NULL
}))
g <- graph.data.frame(edg)
split(V(g)$name, clusters(g)$membership)
}
igraph_method2 <- function(lst) {
edg <- do.call("rbind", lapply(lst, function(x) {
if (length(x) > 1) cbind(head(x, -1), tail(x, -1)) else NULL
}))
g <- graph.data.frame(edg)
split(V(g)$name, clusters(g)$membership)
}
iterative_method <- function(lst) {
Reduce(function(l, x) {
matches <- sapply(l, function(i) any(x %in% i))
if (any(matches)) {
combined <- unique(c(unlist(l[matches]), x))
l[matches] <- NULL # Delete old entries
l <- c(l, list(combined)) # Add combined entries
} else {
l <- c(l, list(x)) # New list entry
}
l
}, lst, init=list())
}
答案 0 :(得分:7)
解决此类问题的一种方法是构建一个图表,其中节点是列表中的值,边缘是这些值是否一起出现。然后,您只需要询问该图表的已连接组件。 R中的igraph
包使这很容易。首先,您要构建具有边缘的数据框:
edges <- do.call(rbind, lapply(foo, function(x) {
if (length(x) > 1) cbind(head(x, -1), tail(x, -1)) else NULL
}))
edges
# [,1] [,2]
# [1,] "a" "b"
# [2,] "b" "d"
# [3,] "c" "b"
# [4,] "b" "d"
# [5,] "e" "f"
# [6,] "e" "g"
然后,您可以从边缘构建图形并计算连接的组件:
library(igraph)
g <- graph.data.frame(edges, directed=FALSE)
split(V(g)$name, clusters(g)$membership)
# $`1`
# [1] "a" "b" "c" "d"
#
# $`2`
# [1] "e" "f" "g"
对于相当大的问题,这种方法似乎比迭代方法更快:
values = as.character(1:2000)
set.seed(144)
foo <- lapply(1:4000, function(x) sample(values, rbinom(1, 10, .5)))
library(microbenchmark)
microbenchmark(josilber(foo), lundberg(foo))
# Unit: milliseconds
# expr min lq median uq max neval
# josilber(foo) 251.8007 281.0168 297.2446 314.6714 635.7916 100
# lundberg(foo) 640.0575 714.9658 761.3777 827.5415 1118.3517 100
答案 1 :(得分:2)
这是一种迭代方法,为结果建立一个列表,并将元素组合在一起:
Reduce(function(l, x) {
matches <- sapply(l, function(i) any(x %in% i))
if (any(matches)) {
combined <- unique(c(unlist(l[matches]), x))
l[matches] <- NULL # Delete old entries
l <- c(l, list(combined)) # Add combined entries
} else {
l <- c(l, list(x)) # New list entry
}
l
}, foo, init=list())
## [[1]]
## [1] "a" "b" "d" "c"
##
## [[2]]
## [1] "e" "f" "g"