使用purrr:map()从不规则列表中提取数据

时间:2019-08-15 20:02:37

标签: r list dictionary purrr

给出一个包含多个元素的列表,目标是将它们放入数据框。 purr软件包中的map_df函数对于常规列表非常有用,但对于不规则列表会产生错误。

例如,按照this教程进行以下工作:

library(purrr)
library(repurrrsive) # The data comes from this package


map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))

 A tibble: 30 x 6
   name               culture  gender    id born                                   alive
   <chr>              <chr>    <chr>  <int> <chr>                                  <lgl>
 1 Theon Greyjoy      Ironborn Male    1022 In 278 AC or 279 AC, at Pyke           TRUE 
 2 Tyrion Lannister   ""       Male    1052 In 273 AC, at Casterly Rock            TRUE 
 3 Victarion Greyjoy  Ironborn Male    1074 In 268 AC or before, at Pyke           TRUE 
 4 Will               ""       Male    1109 ""                                     FALSE
 5 Areo Hotah         Norvoshi Male    1166 In 257 AC or before, at Norvos         TRUE 
 6 Chett              ""       Male    1267 At Hag's Mire                          FALSE
 7 Cressen            ""       Male    1295 In 219 AC or 220 AC                    FALSE
 8 Arianne Martell    Dornish  Female   130 In 276 AC, at Sunspear                 TRUE 
 9 Daenerys Targaryen Valyrian Female  1303 In 284 AC, at Dragonstone              TRUE 
10 Davos Seaworth     Westeros Male    1319 In 260 AC or before, at King's Landing TRUE 
# … with 20 more rows

但是,如果将元素从列表中删除,则该函数将失败。

got_chars[[1]]["gender"]<-NULL
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))

#Error: Argument 3 is a list, must contain atomic vectors

所需的输出将是缺少元素的NA值。一个好的解决方案是什么?我怀疑该解决方案包括使用purrr:possibly(),但我还没有弄清楚。

3 个答案:

答案 0 :(得分:3)

tidyr的开发版本具有强大的新“嵌套”功能,它们可以处理有问题的数据(选项1)。另一种解决方法是逐列解决问题,该方法使您可以将public partial class Form1 : Form { Timer timer = new Timer { Interval = 10 }; public Form1() { InitializeComponent(); Paint += (s, e) => { }; timer.Tick += (s, e) => Refresh(); timer.Start(); } } 的{​​{1}}参数用作.default,它为缺少的元素提供了一个值(选项2)。

purrr::map()

reprex package(v0.3.0.9000)于2019-08-15创建

答案 1 :(得分:2)

一个固有的问题是[(或其别名magrittr::extract)在缺少我们要提取的元素的情况下的行为:

list(a = 1)["b"]
# $<NA>
# NULL

magrittr::extract(list(a = 1), "b")
# $<NA>
# NULL

我们可以定义:

extract_if_present <- function(x, y) {
  x[intersect(y, names(x))]
}

其行为类似于:

extract_if_present(list(a = 1), "b")
# named list()

然后将缺少元素的行绑定“有效”:

map_dfr(
  got_chars_mutilated,
  extract_if_present,
  c("name", "culture", "gender", "id", "born", "alive")
)
# # A tibble: 30 x 6
#    name               culture     id born                                   alive gender
#    <chr>              <chr>    <int> <chr>                                  <lgl> <chr> 
#  1 Theon Greyjoy      Ironborn  1022 In 278 AC or 279 AC, at Pyke           TRUE  NA    
#  2 Tyrion Lannister   ""        1052 In 273 AC, at Casterly Rock            TRUE  Male  
#  3 Victarion Greyjoy  Ironborn  1074 In 268 AC or before, at Pyke           TRUE  Male  
#  4 Will               ""        1109 ""                                     FALSE Male  
#  5 Areo Hotah         Norvoshi  1166 In 257 AC or before, at Norvos         TRUE  Male  
#  6 Chett              ""        1267 At Hag's Mire                          FALSE Male  
#  7 Cressen            ""        1295 In 219 AC or 220 AC                    FALSE Male  
#  8 Arianne Martell    Dornish    130 In 276 AC, at Sunspear                 TRUE  Female
#  9 Daenerys Targaryen Valyrian  1303 In 284 AC, at Dragonstone              TRUE  Female
# 10 Davos Seaworth     Westeros  1319 In 260 AC or before, at King's Landing TRUE  Male  
# # … with 20 more rows

列的顺序有些混乱,取决于行的顺序和它们错过的内容。

答案 2 :(得分:1)

一种方法是定义一个partial()明确指定的pluck()来提取感兴趣的名称,如果缺少该名称,则返回NA。将修改后的pluck()传递到双图,内图遍历要提取的名称,外图遍历got_chars列表:

v <- set_names(c("name", "culture", "gender", "id", "born", "alive"))
map_dfr( got_chars, ~map(v, partial(pluck, .x, .default=NA)) )
# # A tibble: 30 x 6
#    name             culture  gender    id born                             alive
#    <chr>            <chr>    <chr>  <int> <chr>                            <lgl>
#  1 Theon Greyjoy    Ironborn NA      1022 In 278 AC or 279 AC, at Pyke     TRUE 
#  2 Tyrion Lannister ""       Male    1052 In 273 AC, at Casterly Rock      TRUE 
#  3 Victarion Greyj… Ironborn Male    1074 In 268 AC or before, at Pyke     TRUE 
#  4 Will             ""       Male    1109 ""                               FALSE
#  5 Areo Hotah       Norvoshi Male    1166 In 257 AC or before, at Norvos   TRUE 
#  6 Chett            ""       Male    1267 At Hag's Mire                    FALSE
#  7 Cressen          ""       Male    1295 In 219 AC or 220 AC              FALSE
#  8 Arianne Martell  Dornish  Female   130 In 276 AC, at Sunspear           TRUE 
#  9 Daenerys Targar… Valyrian Female  1303 In 284 AC, at Dragonstone        TRUE 
# 10 Davos Seaworth   Westeros Male    1319 In 260 AC or before, at King's … TRUE 
# # … with 20 more rows

为澄清起见,.xgot_chars上进行迭代,因为它位于用~指定的lambda函数内部,因此它对应于外部map。内部map的函数由partial()指定,该函数将当前查找的got_chars元素(即.x)附加为{{1 }}。然后,经过修改的pluck()接受要提取的名称作为其(新的)第一个参数,因此可以按原样传递给内部地图,而无需任何额外的pluck()