R-如何同时取消列出和合并

时间:2019-05-11 23:35:37

标签: r list data-structures

使用lapply,我将一个输入向量输入到一个函数中,该函数为每个输入返回两个向量的列表-可能的n-gram和它们的概率。我最终得到具有以下结构的列表列表(lol):

> str(lol)
List of 3
 $ :List of 2
  ..$ np1  : chr [1:7] "a" "years" "the" "my" ...
  ..$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
 $ :List of 2
  ..$ np1  : chr [1:167] "the" "a" "my" "years" ...
  ..$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
 $ :List of 2
  ..$ np1  : chr [1:9493] "the" "a" "my" "this" ...
  ..$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...

但是我的目标是一个列表,其中所有向量$np1被串联在一起,而所有$probs向量也被串联在一起。我尝试使用unlist(..., recursive = F)来获取两个向量的列表,与不使用递归标志的unlist相比,它更接近了我要查找的内容。

> str(unlist(lapply(inputs.list, function(x){...}), recursive = F))
List of 6
 $ np1  : chr [1:7] "a" "years" "the" "my" ...
 $ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
 $ np1  : chr [1:167] "the" "a" "my" "years" ...
 $ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
 $ np1  : chr [1:9493] "the" "a" "my" "this" ...
 $ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...

但不完全是...

是否有一种方法可以帮助我进一步将扁平化列表合并为如上所述的两个向量的列表?

以下是可重复使用的示例:

example1 <- list("time in"=list(np1=c("the", "a", "my", "years"), probs=c(0.2745, 0.0924, 0.0605, 0.0437)),"in"=list(np1=c("the", "a", "my", "this"), probs=c(0.267, 0.0777, 0.0239, 0.0169)))
> str(example1)
List of 2
 $ time in:List of 2
  ..$ np1  : chr [1:4] "the" "a" "my" "years"
  ..$ probs: num [1:4] 0.2745 0.0924 0.0605 0.0437
 $ in     :List of 2
  ..$ np1  : chr [1:4] "the" "a" "my" "this"
  ..$ probs: num [1:4] 0.267 0.0777 0.0239 0.0169

3 个答案:

答案 0 :(得分:4)

两个列表可以按照您希望的方式与Map组合,如

Map(c, example1[[1]], example1[[2]])
# $np1
# [1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 
#
# $probs
# [1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169

因此,为了合并列表的整个列表,我们可能会做

Reduce(function(...) Map(c, ...), example1[c(1, 1, 2)])
# $np1
#  [1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 
#
# $probs
#  [1] 0.2745 0.0924 0.0605 0.0437 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169

在这里我故意输入了长度为3的字符,以演示其功能。在您的情况下,我们需要

Reduce(function(...) Map(c, ...), lol)

答案 1 :(得分:3)

这是使用purrr的解决方案:

library(tidyverse)

transpose(example1) %>% map(flatten) %>% map(unlist)

输出:

$np1
[1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 

$probs
[1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169

答案 2 :(得分:2)

这是一个与您正在处理的类似的“不公开”解决方案。它取决于您感兴趣的向量,它们总是交替出现(例如,它总是nth,然后总是probs。祝您好运,请告诉我它是否对您不起作用!

unlist_ed <- unlist(example1, recursive = F)

list(
  np1 = unlist(unlist_ed[c(T, F)]),
  probs = unlist(unlist_ed[c(F, T)])
)

$np1
time in.np11 time in.np12 time in.np13 time in.np14      in.np11      in.np12      in.np13      in.np14 
       "the"          "a"         "my"      "years"        "the"          "a"         "my"       "this" 

$probs
time in.probs1 time in.probs2 time in.probs3 time in.probs4      in.probs1      in.probs2      in.probs3 
        0.2745         0.0924         0.0605         0.0437         0.2670         0.0777         0.0239 
     in.probs4 
        0.0169 

编辑:我想到了另一种依赖向量名称相同的解决方案,但是它要快得多(不是那是目标)。想要更新!

dplyr::bind_rows(example1)
# A tibble: 8 x 2
  np1    probs
  <chr>  <dbl>
1 the   0.274 
2 a     0.0924
3 my    0.0605
4 years 0.0437
5 the   0.267 
6 a     0.0777
7 my    0.0239
8 this  0.0169

不是一个完美的基准:

example1 <- rapply(example1, function(x) rep(x, 1e4), how = "list")
example1 <- rep(example1, 100)

microbenchmark::microbenchmark(

o1 = {
    Reduce(function(...) Map(c, ...), example1)
  },
  o2 = {
    unlist_ed <- unlist(example1, recursive = F)

    list(
      nth = unlist(unlist_ed[c(T, F)]),
      probs = unlist(unlist_ed[c(F, T)])
    )
  },
  o3 = {
    transpose(example1) %>% map(flatten) %>% map(unlist)
  },
  o4 = {
    binded <- dplyr::bind_rows(example1)

    list(binded$np1,
         binded$probs)
  },
  times = 1

)

Unit: milliseconds
 expr        min         lq       mean     median         uq        max neval
   o1 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495     1
   o2 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265     1
   o3 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422     1
   o4   83.32919   83.32919   83.32919   83.32919   83.32919   83.32919     1