我有一个向量列表(如下所示)。我想知道向量的每个元素在哪个列表元素中。换句话说,我想反转列表以创建一个新的列表,其names
取自向量。
这样做的最佳方法是什么?
lst <- list(a=c(2, 3, 6, 10, 15, 17), b=c(4, 6, 9, 7, 6, 4, 3, 10),
c=c(9, 2, 1, 4, 3), d=c(3, 6, 17))
lst
$a
[1] 2 3 6 10 15 17
$b
[1] 4 6 9 7 6 4 3 10
$c
[1] 9 2 1 4 3
$d
[1] 3 6 17
我想得到以下答案。
$`1`
[1] "c"
$`10`
[1] "a" "b"
$`15`
[1] "a"
$`17`
[1] "a" "d"
$`2`
[1] "a" "c"
$`3`
[1] "a" "b" "c" "d"
$`4`
[1] "b" "b" "c"
$`6`
[1] "a" "b" "b" "d"
$`7`
[1] "b"
$`9`
[1] "b" "c"
答案 0 :(得分:8)
以下是使用stack
和unstack
的基本R方式:
unstack(stack(lst), ind ~ values)
# $`1`
# [1] "c"
#
# $`2`
# [1] "a" "c"
#
# $`3`
# [1] "a" "b" "c" "d"
#
# $`4`
# [1] "b" "b" "c"
#
# $`6`
# [1] "a" "b" "b" "d"
#
# $`7`
# [1] "b"
#
# $`9`
# [1] "b" "c"
#
# $`10`
# [1] "a" "b"
#
# $`15`
# [1] "a"
#
# $`17`
# [1] "a" "d"
答案 1 :(得分:6)
以下是使用“reshape2”中的split
后使用基数R的melt
的方法:
library(reshape2)
x <- melt(lst)
split(x$L1, x$value)
# $`1`
# [1] "c"
#
# $`2`
# [1] "a" "c"
#
# $`3`
# [1] "a" "b" "c" "d"
#
# $`4`
# [1] "b" "b" "c"
#
# $`6`
# [1] "a" "b" "b" "d"
#
# $`7`
# [1] "b"
#
# $`9`
# [1] "b" "c"
#
# $`10`
# [1] "a" "b"
#
# $`15`
# [1] "a"
#
# $`17`
# [1] "a" "d"
同样,在带有stack
的基础R中:
x <- stack(lapply(lst, c))
split(as.character(x$ind), x$values)
如果你使用的是“lst”而不是“lst”,那就更直接了:
x <- stack(lst)
split(as.character(x$ind), x$values)
详细说明我的评论,我描述的更有效的方式是:
split(rep(names(lst), lapply(lst, nrow)), unlist(lst, use.names = FALSE))
应用于更大的lst
,我们得到以下结果:
fun1 <- function() split(rep(names(lst), lapply(lst, nrow)), unlist(lst, use.names = FALSE))
fun2 <- function() { x <- stack(lapply(lst, c)) ; split(as.character(x$ind), x$values) }
fun3 <- function() { x <- melt(lst) ; split(x$L1, x$value) }
fun4 <- function() unstack(stack(lapply(lst, as.vector)), ind ~ values)
## Make lst much bigger
lst <- unlist(replicate(10000, lst, simplify = FALSE), recursive=FALSE)
names(lst) <- make.unique(names(lst))
library(microbenchmark)
system.time(fun3())
# user system elapsed
# 48.338 0.000 47.643
microbenchmark(fun1(), fun2(), fun4(), times = 5)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 454.5913 456.6793 473.901 555.8954 574.4394 5
# fun2() 922.1282 1028.4972 1034.872 1068.4761 1150.8072 5
# fun4() 1222.5296 1300.0643 1323.253 1339.2037 1421.1546 5
答案 2 :(得分:0)
unlist
list
获取向量中的所有数字。然后,使用这些数字来分割names
元素list
的向量。
split( rep(names(lst),times=sapply(lst,length)),
unlist(lst) )
$`1`
[1] "c"
$`2`
[1] "a" "c"
$`3`
[1] "a" "b" "c" "d"
$`4`
[1] "b" "b" "c"
$`6`
[1] "a" "b" "b" "d"
$`7`
[1] "b"
$`9`
[1] "b" "c"
$`10`
[1] "a" "b"
$`15`
[1] "a"
$`17`
[1] "a" "d"