Question

我有以下列表：

tmp <- list(c(1,2,1,2,11,29),
            c(2,3,2,3,20,21),
            c(10,11,12,13,14,15))

names(tmp) <- c("ID1","ID2","ID3")

如果我们写出这个列表的5个向量中的每一个，那么我们就有

(1,2,10)
(2,3,11)
(1,2,12)
(2,3,13)
(11,20,14)
(29,21,15)

请注意，组合(1,2,X)和(2,3,Y)会出现两次，两者都有。我想提取那些前两个元素出现n次的向量。因此对n=2我们会得到

(1,2,10)
(1,2,12)

和

(2,3,11)
(2,3,13)

Answer 1

我们遍历list元素，检查vector元素是%in%＆＃39;一个＆＃39; （＆＃39; two＆＃39;），获取逻辑向量的sum，检查它是否大于或等于2，然后使用此索引对list元素进行子集化

tmp[sapply(tmp, function(x) sum(x %in% tmp$one)>=2)]
tmp[sapply(tmp, function(x) sum(x %in% tmp$two)>=2)]

可以在一次通话中合并

lapply(tmp[c("one", "two")], function(x) tmp[sapply(tmp, function(y) sum(y %in% x)>=2)])
#$one
#$one$one
#[1] 1 2 3 4 5 6

#$one$four
#[1]  1  2 15 16 17 18


#$two
#$two$two
#[1]  7  8  9 10 11 12

#$two$five
#[1]  7  8 15 16 17 18

#$two$<NA>
#[1]  7  8 20 21 22 23

根据显示的输出，它也可以是matrix

lapply(tmp[c("one", "two")], function(x) 
         do.call(rbind, tmp[sapply(tmp, function(y) sum(y %in% x)>=2)]))

一般情况下，如果我们想要将list的元素与任意数量的组合进行比较，可以使用combn

lst1 <- combn(tmp, 3, FUN = list)
lst1[sapply(lst1, function(x) length(Reduce(intersect, x))>=3)]

更新

基于新问题

library(purrr)
tmp1 <- transpose(tmp) %>%
            map(unlist, use.names = FALSE)
lst1 <- combn(tmp1, 2, FUN = list)
lapply(lst1[sapply(lst1, function(x) length(Reduce(intersect, x))==2)], 
               function(x) do.call(rbind, x))
#[[1]]
#     [,1] [,2] [,3]
#[1,]    1    2   10
#[2,]    1    2   12

#[[2]]
#     [,1] [,2] [,3]
#[1,]    2    3   11
#[2,]    2    3   13

Answer 2

假设要求是n = 2。

创建一个总结两个向量的人工变量（使用sep="\b"对唯一性更有信心）

idx = paste(tmp[["ID1"]], tmp[["ID2"]], sep="\b")

创建一个汇总每个标签出现的表格，选择满足条件的表格元素，并获取其名称

nms = names(which(table(idx) == n))

确定您想要keep的哪些元素，然后对tmp的每个元素进行分组

keep = idx %in% nms
lapply(tmp, `[`, keep)

具有一点普遍性的功能

fun = function(lst, n, op = `==`, key = 1:2) {
    idx = paste(lst[[ key[1] ]], lst[[ key[2] ]], sep="\b")
    keep = idx %in% names(which(op(table(idx), n)))
    lapply(lst, `[`, keep)
}

提取出现多次的列表元素

2 个答案:

更新