Question

如果您有一个向量列表，那么确定哪些列表元素包含特定记录的好方法是什么？

MsgBox HTMLDoc(i).innerHTML

aList的输出如下所示：

set.seed(8675309)
aList <- list(v1=sample(LETTERS, 20), 
              v2=sample(LETTERS, 10))

我想要这样的事情：

 > aList
$v1
 [1] "E" "L" "S" "R" "F" "O" "T" "Q" "P" "H" "N" "I" "X" "D" "U" "K" "W" "B" "G" "V"

$v2
 [1] "B" "V" "U" "H" "M" "O" "F" "Z" "C" "N"

Answer 1

names(aList)[sapply(1:2,function(x){"B" %in% aList[[x]]})]
[1] "v1" "v2" 

names(aList)[sapply(1:2,function(x){"E" %in% aList[[x]]})]
[1] "v1"

names(aList)[sapply(1:2,function(x){"C" %in% aList[[x]]})]
[1] "v2"

如果您的列表包含未知数量的元素，请使用seq_along：

names(aList)[sapply(seq_along(aList),function(x){"B" %in% aList[[x]]})]
[1] "v1" "v2"

这是关于评论的微基准。

microbenchmark(seq_along(aList),seq_along(names(aList)),1:length(aList),times=100000)
Unit: nanoseconds
                    expr min  lq     mean median   uq    max neval cld
        seq_along(aList) 350 700 659.9117    701  701 208228 1e+05 a  
 seq_along(names(aList)) 351 701 857.1508    701 1051 216977 1e+05  b 
         1:length(aList) 700 701 935.7251   1050 1051 424855 1e+05   c

microbenchmark(etienne(),roland())
Unit: microseconds
      expr    min     lq     mean median     uq     max neval cld
 etienne() 40.597 41.297 45.24751 41.646 41.997 211.378   100   b
  roland() 12.600 13.300 14.40882 14.699 15.049  20.998   100  a

Answer 2

我们可以使用outer和%in%来获取逻辑矩阵（＆＃39; m1＆＃39;），split来row，而不是单独执行此操作并获得相应的names＆＃39; aList＆＃39;。

v1 <- c('B', 'E', 'C')
m1 <- outer(v1, aList, FUN= Vectorize(`%in%`))
lapply(split(m1, row(m1)), function(x) names(aList)[x])
# $`1`
#[1] "v1" "v2"

#$`2`
#[1] "v1"

#$`3`
#[1] "v2"

或者我们melt＆＃39; m1＆＃39;和split长期＆＃39;格式列。

library(reshape2)
with(melt(m1), split(as.character(Var2[value]), Var1[value]))

如何确定哪些列表元素包含R中的记录

2 个答案: