使用正确的purrr :: map函数组合从多级列表创建df

时间:2019-04-17 08:47:17

标签: r list purrr

我有一个相当复杂的多级列表:

echo date('D d M H:i:s',strtotime('next sunday 09:00'));

我可以结合使用 my_list <- list(list(id = 36L, name = "Marathonbet", odds = list(data = list( list(label = "1", value = "1.25", dp3 = "1.250", american = "-400", winning = TRUE, handicap = NULL, total = NULL, bookmaker_event_id = "6938899", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC")), list(label = "2", value = "13.75", dp3 = "13.750", american = "1275", winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "6938899", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC")), list(label = "X", value = "7.00", dp3 = "7.000", american = "600", winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "6938899", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC"))))), list(id = 7L, name = "888Sport", odds = list(data = list(list(label = "1", value = "1.23", dp3 = "1.230", american = "-435", winning = TRUE, handicap = NULL, total = NULL, bookmaker_event_id = "1004746417", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC")), list(label = "2", value = "12.50", dp3 = "12.500", american = "1150", winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "1004746417", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC")), list(label = "X", value = "6.50", dp3 = "6.500", american = "550", winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "1004746417", last_update = list(date = "2018-08-12 13:12:23.000000", timezone_type = 3L, timezone = "UTC"))))), list(id = 9L, name = "BetFred", odds = list(data = list(list(label = "1", value = "1.30", dp3 = NULL, american = NULL, winning = TRUE, handicap = NULL, total = NULL, bookmaker_event_id = "1085457020", last_update = list(date = "2018-07-26 08:30:19.000000", timezone_type = 3L, timezone = "UTC")), list(label = "2", value = "9.00", dp3 = NULL, american = NULL, winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "1085457020", last_update = list(date = "2018-07-26 08:30:19.000000", timezone_type = 3L, timezone = "UTC")), list(label = "X", value = "5.50", dp3 = NULL, american = NULL, winning = FALSE, handicap = NULL, total = NULL, bookmaker_event_id = "1085457020", last_update = list(date = "2018-07-26 08:30:19.000000", timezone_type = 3L, timezone = "UTC")))))) map来消除嵌套级别,但是我正在努力将这些级别绑定到数据帧中并保留所有数据。例如,在map_depth级,有三个子列表。将那个级别转换为df时,我应该只得到一行数据,而应该是3。

我想做的是将整个列表转换为数据框,其中子列表中的公共元素如下: my_list[[1]][["odds"]][["data"]]my_list[[1]][["odds"]][["data"]][[1]][["bookmaker_event_id"]]

出现在结果df的同一列中。

这似乎很容易实现,但是我要么会丢失数据行,要么会my_list[[2]][["odds"]][["data"]][[1]][["bookmaker_event_id"]]失败。从此测试列表中得到的数据帧应具有9行和13列左右。

我想使用Error: Argument 1 must have names系列函数,请避免任何循环。

2 个答案:

答案 0 :(得分:2)

如果您可以使用lapply解决方案,因为我对map不太熟悉:

DF <- bind_rows(lapply(my_list,function(ll){ #lapply over the list and bind result to tibble
  id <- ll[['id']] #Extract id
  name <- ll[['name']] #Extract name

  #clean up date and unlist sublists
  ll <- lapply(ll[['odds']][['data']],function(il)
  {
    il$last_update <- unlist(il$last_update)
    return(unlist(il))
  })

  df <- as_tibble(do.call(rbind,ll)) #bind the sublists and generate tibble
  df$id <- rep(id,nrow(df)) #add id
  df$name <- rep(name,nrow(df)) #add name
  return(df) #return df
}))

DF

A tibble: 9 x 11
  label value dp3    american winning bookmaker_event~ last_update.date last_update.tim~ last_update.tim~    id name 
  <chr> <chr> <chr>  <chr>    <chr>   <chr>            <chr>            <chr>            <chr>            <int> <chr>
1 1     1.25  1.250  -400     TRUE    6938899          2018-08-12 13:1~ 3                UTC                 36 Mara~
2 2     13.75 13.750 1275     FALSE   6938899          2018-08-12 13:1~ 3                UTC                 36 Mara~
3 X     7.00  7.000  600      FALSE   6938899          2018-08-12 13:1~ 3                UTC                 36 Mara~
4 1     1.23  1.230  -435     TRUE    1004746417       2018-08-12 13:1~ 3                UTC                  7 888S~
5 2     12.50 12.500 1150     FALSE   1004746417       2018-08-12 13:1~ 3                UTC                  7 888S~
6 X     6.50  6.500  550      FALSE   1004746417       2018-08-12 13:1~ 3                UTC                  7 888S~
7 1     1.30  NA     NA       TRUE    1085457020       2018-07-26 08:3~ 3                UTC                  9 BetF~
8 2     9.00  NA     NA       FALSE   1085457020       2018-07-26 08:3~ 3                UTC                  9 BetF~
9 X     5.50  NA     NA       FALSE   1085457020       2018-07-26 08:3~ 3                UTC                  9 BetF~

答案 1 :(得分:0)

由于unlistflatten忽略了NULL,因此使用@shayaa函数here将Null替换为NA

replace_null <- function(x) {
lapply(x, function(x) {
if (is.list(x)){
  replace_null(x)
  } else{
    if(is.null(x)) NA else(x)
  } 
})
}

然后使用tibblepurrr::flatten

library(dplyr)
library(purrr)
my_list %>% {
tibble(
  id=map_dbl(.,'id'),
  name=map_chr(.,'name'),
  odds=map(.,'odds') %>% map(. ,'data') %>% map(.,.%>% map(replace_null) %>% map_df(flatten))
  #odds=map(.,~.x[['odds']][['data']] %>% map(replace_null) %>% map_df(flatten))
  )} %>% 
  unnest(odds)

  # A tibble: 9 x 13
        id name        label value dp3    american winning handicap total bookmaker_event_~ date                    timezone_type timezone
      <dbl> <chr>       <chr> <chr> <chr>  <chr>    <lgl>   <lgl>    <lgl> <chr>             <chr>                           <int> <chr>   
  1    36 Marathonbet 1     1.25  1.250  -400     TRUE    NA       NA    6938899           2018-08-12 13:12:23.00~             3 UTC     
  2    36 Marathonbet 2     13.75 13.750 1275     FALSE   NA       NA    6938899           2018-08-12 13:12:23.00~             3 UTC     
  3    36 Marathonbet X     7.00  7.000  600      FALSE   NA       NA    6938899           2018-08-12 13:12:23.00~             3 UTC     
  4     7 888Sport    1     1.23  1.230  -435     TRUE    NA       NA    1004746417        2018-08-12 13:12:23.00~             3 UTC     
  5     7 888Sport    2     12.50 12.500 1150     FALSE   NA       NA    1004746417        2018-08-12 13:12:23.00~             3 UTC     
  6     7 888Sport    X     6.50  6.500  550      FALSE   NA       NA    1004746417        2018-08-12 13:12:23.00~             3 UTC     
  7     9 BetFred     1     1.30  NA     NA       TRUE    NA       NA    1085457020        2018-07-26 08:30:19.00~             3 UTC     
  8     9 BetFred     2     9.00  NA     NA       FALSE   NA       NA    1085457020        2018-07-26 08:30:19.00~             3 UTC     
  9     9 BetFred     X     5.50  NA     NA       FALSE   NA       NA    1085457020        2018-07-26 08:30:19.00~             3 UTC 

有关更多信息,请参见this purrr教程。