使用lapply
,我将一个输入向量输入到一个函数中,该函数为每个输入返回两个向量的列表-可能的n-gram和它们的概率。我最终得到具有以下结构的列表列表(lol):
> str(lol)
List of 3
$ :List of 2
..$ np1 : chr [1:7] "a" "years" "the" "my" ...
..$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
$ :List of 2
..$ np1 : chr [1:167] "the" "a" "my" "years" ...
..$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
$ :List of 2
..$ np1 : chr [1:9493] "the" "a" "my" "this" ...
..$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...
但是我的目标是一个列表,其中所有向量$np1
被串联在一起,而所有$probs
向量也被串联在一起。我尝试使用unlist(..., recursive = F)
来获取两个向量的列表,与不使用递归标志的unlist
相比,它更接近了我要查找的内容。
> str(unlist(lapply(inputs.list, function(x){...}), recursive = F))
List of 6
$ np1 : chr [1:7] "a" "years" "the" "my" ...
$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
$ np1 : chr [1:167] "the" "a" "my" "years" ...
$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
$ np1 : chr [1:9493] "the" "a" "my" "this" ...
$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...
但不完全是...
是否有一种方法可以帮助我进一步将扁平化列表合并为如上所述的两个向量的列表?
以下是可重复使用的示例:
example1 <- list("time in"=list(np1=c("the", "a", "my", "years"), probs=c(0.2745, 0.0924, 0.0605, 0.0437)),"in"=list(np1=c("the", "a", "my", "this"), probs=c(0.267, 0.0777, 0.0239, 0.0169)))
> str(example1)
List of 2
$ time in:List of 2
..$ np1 : chr [1:4] "the" "a" "my" "years"
..$ probs: num [1:4] 0.2745 0.0924 0.0605 0.0437
$ in :List of 2
..$ np1 : chr [1:4] "the" "a" "my" "this"
..$ probs: num [1:4] 0.267 0.0777 0.0239 0.0169
答案 0 :(得分:4)
两个列表可以按照您希望的方式与Map
组合,如
Map(c, example1[[1]], example1[[2]])
# $np1
# [1] "the" "a" "my" "years" "the" "a" "my" "this"
#
# $probs
# [1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169
因此,为了合并列表的整个列表,我们可能会做
Reduce(function(...) Map(c, ...), example1[c(1, 1, 2)])
# $np1
# [1] "the" "a" "my" "years" "the" "a" "my" "years" "the" "a" "my" "this"
#
# $probs
# [1] 0.2745 0.0924 0.0605 0.0437 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169
在这里我故意输入了长度为3的字符,以演示其功能。在您的情况下,我们需要
Reduce(function(...) Map(c, ...), lol)
答案 1 :(得分:3)
这是使用purrr
的解决方案:
library(tidyverse)
transpose(example1) %>% map(flatten) %>% map(unlist)
输出:
$np1
[1] "the" "a" "my" "years" "the" "a" "my" "this"
$probs
[1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169
答案 2 :(得分:2)
这是一个与您正在处理的类似的“不公开”解决方案。它取决于您感兴趣的向量,它们总是交替出现(例如,它总是nth
,然后总是probs
。祝您好运,请告诉我它是否对您不起作用!
unlist_ed <- unlist(example1, recursive = F)
list(
np1 = unlist(unlist_ed[c(T, F)]),
probs = unlist(unlist_ed[c(F, T)])
)
$np1
time in.np11 time in.np12 time in.np13 time in.np14 in.np11 in.np12 in.np13 in.np14
"the" "a" "my" "years" "the" "a" "my" "this"
$probs
time in.probs1 time in.probs2 time in.probs3 time in.probs4 in.probs1 in.probs2 in.probs3
0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239
in.probs4
0.0169
编辑:我想到了另一种依赖向量名称相同的解决方案,但是它要快得多(不是那是目标)。想要更新!
dplyr::bind_rows(example1)
# A tibble: 8 x 2
np1 probs
<chr> <dbl>
1 the 0.274
2 a 0.0924
3 my 0.0605
4 years 0.0437
5 the 0.267
6 a 0.0777
7 my 0.0239
8 this 0.0169
不是一个完美的基准:
example1 <- rapply(example1, function(x) rep(x, 1e4), how = "list")
example1 <- rep(example1, 100)
microbenchmark::microbenchmark(
o1 = {
Reduce(function(...) Map(c, ...), example1)
},
o2 = {
unlist_ed <- unlist(example1, recursive = F)
list(
nth = unlist(unlist_ed[c(T, F)]),
probs = unlist(unlist_ed[c(F, T)])
)
},
o3 = {
transpose(example1) %>% map(flatten) %>% map(unlist)
},
o4 = {
binded <- dplyr::bind_rows(example1)
list(binded$np1,
binded$probs)
},
times = 1
)
Unit: milliseconds
expr min lq mean median uq max neval
o1 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 1
o2 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 1
o3 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 1
o4 83.32919 83.32919 83.32919 83.32919 83.32919 83.32919 1