Question

作为一个例子，我有一个包含各种长度（和一些NULL）的矢量列表，并希望找到第一个包含两个元素的列表元素。与在this帖子中一样，我知道使用列表，您可以使用sapply()并使用第一个结果的子集来使用类似的方法。由于上面使用match()链接的帖子中的解决方案在这种情况下不起作用，我很好奇是否有更优雅（并且计算效率更高）的方法来实现这一点。

可重复的例子

# some example data
x <- list(NULL, NULL, NA, rep("foo", 6), c("we want", "this one"),
           c(letters[1:10]), c("foo", "bar"), NULL)
x

# find the first element of length 2 using sapply and sub-setting to result #1
x[sapply(x, FUN=function(i) {length(i)==2})][[1]]

或者，如@Josh O＆＃39; Brien对this post的回答，

# get the index of the first element of length 2
seq_along(x)[sapply(x, FUN=function(i) {length(i)==2})]

有什么想法或想法吗？

Answer 1

你想要这个吗？

Find(function(i) length(i) == 2, x) # [1] "we want"  "this one"
Position(function(i) length(i) == 2, x) # [1] 5

Answer 2

mapply似乎很快

> x <- rep(x, 25000)
> microbenchmark({ x[match(2, mapply(length, x))] })
# Unit: milliseconds
#       min       lq   median       uq      max neval
#  243.7502 275.8941 326.2993 337.9221 405.7011   100

同时检查x[mapply(length, x) == 2][[1]]

这与sapply

的方式不同

>  x[sapply(x, length) == 2][[1]]
# [1] "we want"  "this one"

下一个很有意思。

> x[ grep("2", summary(x)[,1])[1] ]
# [[1]]
# [1] "we want"  "this one"

Answer 3

我针对由rep(x, 25000)制作的200,000个元素（28.8 Mb）的单个列表建议的每个解决方案进行基准测试。这只是我的例子重复多次的x列表。结果如下：

> microbenchmark(Find(function(i) length(i) == 2, x),
                  x[sapply(x, length) == 2][[1]],
                  x[sapply(x, FUN=function(i) {length(i)==2})][[1]],
                  x[[match(2,lapply(x,length))]],
                  x[match(2, mapply(length, x))],
                  x[mapply(length, x) == 2][[1]])
Unit: microseconds
                                                        expr        min         lq      median          uq        max neval
                   Find(function(i) length(i) == 2, x)     89.104    107.531    112.8955    119.6605    466.045   100
                        x[sapply(x, length) == 2][[1]] 166539.621 185113.274 193224.0270 209923.2405 378499.180   100
x[sapply(x, FUN = function(i) {length(i) == 2 })][[1]] 279596.600 301976.512 310928.3845 322857.7610 484233.342   100
                      x[[match(2, lapply(x, length))]] 378391.882 388831.223 398639.1430 415137.0565 591727.647   100
                        x[match(2, mapply(length, x))] 207324.777 225027.221 235982.9895 249744.3525 422451.010   100
                        x[mapply(length, x) == 2][[1]] 205649.537 223045.252 236039.6710 249529.5245 411916.734   100

感谢您提供快速而翔实的回复！

Answer 4

使用match可以正常工作。

match(2,lapply(x,length))
#[1] 5
x[[match(2,lapply(x,length))]]
#[1] "we want"  "this one"

查找（并返回）满足（逻辑）测试的列表的第一个元素

可重复的例子

4 个答案: