我有一个向量列表,我想识别共享>的向量组。 80%的价值观。我希望使用组中包含的所有向量的名称来命名新组。我希望组中的新元素是分组向量中的所有可能元素。我不确定这是多么可行。
这是一个简单的例子:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
b <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
c <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,16)
d <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)
e <- c(1,2,3,4,5,6)
f <- c(18,19,20,21,22,23,24,25,26)
g <- c(18,19,20,21,22,23,24,25)
h <- c(18,19,20,21,22,23,24)
i <- c(18,19,1,2,3,4,5,6,7)
所以我认为我们会得到两组:
a-b-c-d-e-i
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
f-g-h
[1] 18 19 20 21 22 23 24 25 26
向量中的值不必是数字,它们可以是字符串。如果我还需要指定其他内容,请与我们联系。谢谢!
答案 0 :(得分:2)
假设向量位于本答案末尾的注释中所示的列表V中,计算邻接矩阵adj
,如果第i个和第j个向量至少具有adj[i,j]
为1其中80%的元素是共同的,否则为0。我们将其基于两个向量中较短的一部分。然后使用igraph包将adj
转换为图g
并计算其组件。
library(igraph)
overlap <- function(u, v) length(intersect(u, v)) / min(length(u), length(v)) > 0.8
adj <- sapply(V, function(u) sapply(V, overlap, u)) + 0
g <- graph_from_adjacency_matrix(adj)
memb <- components(g)$membership
memb
## a b c d e f g h i
## 1 1 1 1 1 2 2 2 1
s <- split(V, memb)
groups <- lapply(s, function(x) unique(unlist(x)))
names(groups) <- lapply(s, function(x) paste(names(x), collapse = "-"))
groups
,并提供:
$`a-b-c-d-e-i`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
$`f-g-h`
[1] 18 19 20 21 22 23 24 25 26
我们也可以绘制图表:
plot(g)
注意:强>
V <- structure(list(a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14), b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15),
c = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16),
d = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17), e = c(1, 2, 3, 4, 5, 6), f = c(18, 19, 20, 21, 22,
23, 24, 25, 26), g = c(18, 19, 20, 21, 22, 23, 24, 25), h = c(18,
19, 20, 21, 22, 23, 24), i = c(18, 19, 1, 2, 3, 4, 5, 6,
7)), .Names = c("a", "b", "c", "d", "e", "f", "g", "h", "i"
))