我有一个相当复杂的多级列表:
echo date('D d M H:i:s',strtotime('next sunday 09:00'));
我可以结合使用 my_list <- list(list(id = 36L, name = "Marathonbet", odds = list(data = list(
list(label = "1", value = "1.25", dp3 = "1.250", american = "-400",
winning = TRUE, handicap = NULL, total = NULL, bookmaker_event_id = "6938899",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "2",
value = "13.75", dp3 = "13.750", american = "1275", winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "6938899",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "X",
value = "7.00", dp3 = "7.000", american = "600", winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "6938899",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC"))))), list(id = 7L,
name = "888Sport", odds = list(data = list(list(label = "1",
value = "1.23", dp3 = "1.230", american = "-435", winning = TRUE,
handicap = NULL, total = NULL, bookmaker_event_id = "1004746417",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "2",
value = "12.50", dp3 = "12.500", american = "1150", winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "1004746417",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "X",
value = "6.50", dp3 = "6.500", american = "550", winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "1004746417",
last_update = list(date = "2018-08-12 13:12:23.000000",
timezone_type = 3L, timezone = "UTC"))))), list(id = 9L,
name = "BetFred", odds = list(data = list(list(label = "1",
value = "1.30", dp3 = NULL, american = NULL, winning = TRUE,
handicap = NULL, total = NULL, bookmaker_event_id = "1085457020",
last_update = list(date = "2018-07-26 08:30:19.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "2",
value = "9.00", dp3 = NULL, american = NULL, winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "1085457020",
last_update = list(date = "2018-07-26 08:30:19.000000",
timezone_type = 3L, timezone = "UTC")), list(label = "X",
value = "5.50", dp3 = NULL, american = NULL, winning = FALSE,
handicap = NULL, total = NULL, bookmaker_event_id = "1085457020",
last_update = list(date = "2018-07-26 08:30:19.000000",
timezone_type = 3L, timezone = "UTC"))))))
和map
来消除嵌套级别,但是我正在努力将这些级别绑定到数据帧中并保留所有数据。例如,在map_depth
级,有三个子列表。将那个级别转换为df时,我应该只得到一行数据,而应该是3。
我想做的是将整个列表转换为数据框,其中子列表中的公共元素如下:
my_list[[1]][["odds"]][["data"]]
和
my_list[[1]][["odds"]][["data"]][[1]][["bookmaker_event_id"]]
出现在结果df的同一列中。
这似乎很容易实现,但是我要么会丢失数据行,要么会my_list[[2]][["odds"]][["data"]][[1]][["bookmaker_event_id"]]
失败。从此测试列表中得到的数据帧应具有9行和13列左右。
我想使用Error: Argument 1 must have names
系列函数,请避免任何循环。
答案 0 :(得分:2)
如果您可以使用lapply
解决方案,因为我对map
不太熟悉:
DF <- bind_rows(lapply(my_list,function(ll){ #lapply over the list and bind result to tibble
id <- ll[['id']] #Extract id
name <- ll[['name']] #Extract name
#clean up date and unlist sublists
ll <- lapply(ll[['odds']][['data']],function(il)
{
il$last_update <- unlist(il$last_update)
return(unlist(il))
})
df <- as_tibble(do.call(rbind,ll)) #bind the sublists and generate tibble
df$id <- rep(id,nrow(df)) #add id
df$name <- rep(name,nrow(df)) #add name
return(df) #return df
}))
DF
A tibble: 9 x 11
label value dp3 american winning bookmaker_event~ last_update.date last_update.tim~ last_update.tim~ id name
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr>
1 1 1.25 1.250 -400 TRUE 6938899 2018-08-12 13:1~ 3 UTC 36 Mara~
2 2 13.75 13.750 1275 FALSE 6938899 2018-08-12 13:1~ 3 UTC 36 Mara~
3 X 7.00 7.000 600 FALSE 6938899 2018-08-12 13:1~ 3 UTC 36 Mara~
4 1 1.23 1.230 -435 TRUE 1004746417 2018-08-12 13:1~ 3 UTC 7 888S~
5 2 12.50 12.500 1150 FALSE 1004746417 2018-08-12 13:1~ 3 UTC 7 888S~
6 X 6.50 6.500 550 FALSE 1004746417 2018-08-12 13:1~ 3 UTC 7 888S~
7 1 1.30 NA NA TRUE 1085457020 2018-07-26 08:3~ 3 UTC 9 BetF~
8 2 9.00 NA NA FALSE 1085457020 2018-07-26 08:3~ 3 UTC 9 BetF~
9 X 5.50 NA NA FALSE 1085457020 2018-07-26 08:3~ 3 UTC 9 BetF~
答案 1 :(得分:0)
由于unlist
和flatten
忽略了NULL,因此使用@shayaa函数here将Null替换为NA
replace_null <- function(x) {
lapply(x, function(x) {
if (is.list(x)){
replace_null(x)
} else{
if(is.null(x)) NA else(x)
}
})
}
然后使用tibble
和purrr::flatten
library(dplyr)
library(purrr)
my_list %>% {
tibble(
id=map_dbl(.,'id'),
name=map_chr(.,'name'),
odds=map(.,'odds') %>% map(. ,'data') %>% map(.,.%>% map(replace_null) %>% map_df(flatten))
#odds=map(.,~.x[['odds']][['data']] %>% map(replace_null) %>% map_df(flatten))
)} %>%
unnest(odds)
# A tibble: 9 x 13
id name label value dp3 american winning handicap total bookmaker_event_~ date timezone_type timezone
<dbl> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <chr> <chr> <int> <chr>
1 36 Marathonbet 1 1.25 1.250 -400 TRUE NA NA 6938899 2018-08-12 13:12:23.00~ 3 UTC
2 36 Marathonbet 2 13.75 13.750 1275 FALSE NA NA 6938899 2018-08-12 13:12:23.00~ 3 UTC
3 36 Marathonbet X 7.00 7.000 600 FALSE NA NA 6938899 2018-08-12 13:12:23.00~ 3 UTC
4 7 888Sport 1 1.23 1.230 -435 TRUE NA NA 1004746417 2018-08-12 13:12:23.00~ 3 UTC
5 7 888Sport 2 12.50 12.500 1150 FALSE NA NA 1004746417 2018-08-12 13:12:23.00~ 3 UTC
6 7 888Sport X 6.50 6.500 550 FALSE NA NA 1004746417 2018-08-12 13:12:23.00~ 3 UTC
7 9 BetFred 1 1.30 NA NA TRUE NA NA 1085457020 2018-07-26 08:30:19.00~ 3 UTC
8 9 BetFred 2 9.00 NA NA FALSE NA NA 1085457020 2018-07-26 08:30:19.00~ 3 UTC
9 9 BetFred X 5.50 NA NA FALSE NA NA 1085457020 2018-07-26 08:30:19.00~ 3 UTC
有关更多信息,请参见this purrr教程。