Question

好的，我的问题可能比标题说明的要怪。我有这个清单：

x <- list(
  c("a", "d"),
  c("a", "c"), 
  c("d", "e"),
  c("e", "f"), 
  c("b", "c"), 
  c("f", "c"), # row 6 
  c("c", "e"), 
  c("f", "b"), 
  c("b", "a")
)

我需要将这些内容复制到另一个名为T的列表中。唯一的条件是，该对中的两个字母都不能已经在T中。如果其中一个已经在T中，而另一个不在，则很好。

基本上，在此示例中，我将采用前5个位置并将它们一个接一个地复制到T中，因为一个或两个字母都不是T中的新字母。

然后我将跳过第六个位置，因为字母“ f”已经在T的第四位置，而字母“ c”已经在T的第二和第五位置。

然后出于相同的原因（此时字母“ c”，“ e”，“ f”，“ b”，“ a”已经在T中），我将跳过其余3个位置

我尝试这样做

for(i in 1:length(T){
   if (!( *first letter* %in% T && *second letter* %in% T)) {
      T[[i]] <- c(*first letter*, *second letter*)
   }
}

但这就像“ if”甚至不存在一样，我很确定我以错误的方式使用了％in％。

有什么建议吗？我希望我写的东西有意义，我是R和整个网站的新手。

感谢您的时间

Answer 1

有效地，对于列表中的每个元素，如果两个元素都存在于较早的元素中，则希望将其丢失。逻辑索引在这里很有帮助。

# Make a logical vector the length of x.
lose <- logical(length(x))

现在，您可以在lose的长度上运行循环，并将其与x的所有先前元素进行比较。使用seq_len为我们省去了防范i = 1特殊情况的麻烦（seq_len(0)返回的是零长度整数而不是0）。

for (i in seq_along(lose)){
  lose[i] <- all(x[[i]] %in% unique(unlist(x[seq_len(i - 1)])))
}

现在让我们使用逻辑向量将x到T的子集

T <- x[!lose]

T
#> [[1]]
#> [1] "a" "d"
#> 
#> [[2]]
#> [1] "a" "c"
#> 
#> [[3]]
#> [1] "d" "e"
#> 
#> [[4]]
#> [1] "e" "f"
#> 
#> [[5]]
#> [1] "b" "c"

# Created on 2018-07-19 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).

Answer 2

您可以将所有先前元素的集合放在列表cum.sets中，然后使用Map检查当前向量的所有元素是否都在滞后累积集中。

cum.sets <- lapply(seq_along(x), function(y) unlist(x[1:y]))
keep <- unlist(
          Map(function(x, y) !all(x %in% y)
              , x
              , c(NA, cum.sets[-length(cum.sets)])))

x[keep]

# [[1]]
# [1] "a" "d"
# 
# [[2]]
# [1] "a" "c"
# 
# [[3]]
# [1] "d" "e"
# 
# [[4]]
# [1] "e" "f"
# 
# [[5]]
# [1] "b" "c"

tidyverse版本（相同输出）

library(tidyverse)

cum.sets <- imap(x, ~ unlist(x[1:.y]))
keep <- map2_lgl(x, lag(cum.sets), ~!all(.x %in% .y))

x[keep]

Answer 3

您可以使用Reduce。在这种情况下。如果所有新值都不在列表中，则将其串联到列表中，否则将其删除。首字母是列表的第一个元素：

 Reduce(function(i, y) c(i, if(!all(y %in% unlist(i))) list(y)), x[-1],init = x[1])

[[1]]
[1] "a" "d"

[[2]]
[1] "a" "c"

[[3]]
[1] "d" "e"

[[4]]
[1] "e" "f"

[[5]]
[1] "b" "c"

Answer 4

最直接的选择是，当您遍历输入数据时，可以将唯一项存储在另一个向量中。

这是不考虑字母在输出列表中的位置（1或2）或输入列表顺序的解决方案。

dat <- list(c('a','d'),c('a','c'),c('d','e'),c('e','f'),c('b','c'),
            c('f','c'),c('c','e'),c('f','b'),c('b','a'))
Dat <- list()
idx <- list()
for(i in dat){
  if(!all(i %in% idx)){
    Dat <- append(Dat, list(i))
    ## append to idx if not previously observed
    if(! i[1] %in% idx) idx <- append(idx, i[1])
    if(! i[2] %in% idx) idx <- append(idx, i[2])
  }
}
print(Dat)
#> [[1]]
#> [1] "a" "d"
#> 
#> [[2]]
#> [1] "a" "c"
#> 
#> [[3]]
#> [1] "d" "e"
#> 
#> [[4]]
#> [1] "e" "f"
#> 
#> [[5]]
#> [1] "b" "c"

另一方面，我建议不要使用T作为您的向量名称，因为它在R中用作TRUE。

Answer 5

我们可以unlist，用duplicated检查重复的值，将其重新格式化为矩阵并滤除成对的TRUE值：

x[colSums(matrix(duplicated(unlist(x)), nrow = 2)) != 2]
# [[1]]
# [1] "a" "d"
# 
# [[2]]
# [1] "a" "c"
# 
# [[3]]
# [1] "d" "e"
# 
# [[4]]
# [1] "e" "f"
# 
# [[5]]
# [1] "b" "c"
#

而且我建议您不要使用T作为变量名，它的默认含义是TRUE（认为不建议这样使用），这可能会导致令人不快的调试。

R-如何检查元素是否在向量列表中？

5 个答案: