如何使用tidyverse在列表列上设置名称:tibble,purrr,dplyr

时间:2019-07-18 14:06:47

标签: r dplyr purrr tibble

简短版本,我希望能够set_names()函数返回的“列表列”上的summarise()。因此,如果我有一个使用range()函数的列表列,我希望能够将名称设置为“ min”和“ max”。

下面是细节和可复制的示例。

library(tidyverse)

# Consider the following:
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total))
  )
#> # A tibble: 5 x 2
#>   vore    sleep_total_range
#>   <chr>   <list>           
#> 1 carni   <dbl [2]>        
#> 2 herbi   <dbl [2]>        
#> 3 insecti <dbl [2]>        
#> 4 omni    <dbl [2]>        
#> 5 <NA>    <dbl [2]>

# I would like to be able to identify and label (i.e., set_names()) for the 
# min and max columns

# Fail 1: No Column, No Labels
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total))
  ) %>% 
  unnest()
#> # A tibble: 10 x 2
#>    vore    sleep_total_range
#>    <chr>               <dbl>
#>  1 carni                 2.7
#>  2 carni                19.4
#>  3 herbi                 1.9
#>  4 herbi                16.6
#>  5 insecti               8.4
#>  6 insecti              19.9
#>  7 omni                  8  
#>  8 omni                 18  
#>  9 <NA>                  5.4
#> 10 <NA>                 13.7

# Fail 2: Column, but labels are not correct
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total) %>% enframe(name = "range_col"))
  ) %>% 
  unnest()
#> # A tibble: 10 x 3
#>    vore    range_col value
#>    <chr>       <int> <dbl>
#>  1 carni           1   2.7
#>  2 carni           2  19.4
#>  3 herbi           1   1.9
#>  4 herbi           2  16.6
#>  5 insecti         1   8.4
#>  6 insecti         2  19.9
#>  7 omni            1   8  
#>  8 omni            2  18  
#>  9 <NA>            1   5.4
#> 10 <NA>            2  13.7

所需结果

# Success: This is my desired result/output, but it feels verbose, 
# and not very "tidyverse / purrr"
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total) %>% enframe(name = "range_col"))
  ) %>% 
  unnest() %>%
  mutate(
    range_col = ifelse(range_col == 1, "min", "max")
  )
#> # A tibble: 10 x 3
#>    vore    range_col value
#>    <chr>   <chr>     <dbl>
#>  1 carni   min         2.7
#>  2 carni   max        19.4
#>  3 herbi   min         1.9
#>  4 herbi   max        16.6
#>  5 insecti min         8.4
#>  6 insecti max        19.9
#>  7 omni    min         8  
#>  8 omni    max        18  
#>  9 <NA>    min         5.4
#> 10 <NA>    max        13.7

关闭但还没有...

# I thought I was close with this
temp <- 
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total))
  )

temp$sleep_total_range[[1]] %>% set_names(c("min", "max")) %>% enframe()
#> # A tibble: 2 x 2
#>   name  value
#>   <chr> <dbl>
#> 1 min     2.7
#> 2 max    19.4

# But this obviously does not work...
msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total)) %>% 
        set_names(c("min", "max")) %>% 
        enframe()
  )
#> `nm` must be `NULL` or a character vector the same length as `x`

reprex package(v0.3.0)于2019-07-18创建

5 个答案:

答案 0 :(得分:3)

最简单的选择是group_by vore并为每个组计算minmax

但是,如果要继续使用range,一个选择是unnest并为每个c("min", "max")重复vore

library(tidyverse)

msleep %>%
  group_by(vore) %>%
  summarise(sleep_total_range = list(range(sleep_total))) %>% 
  unnest() %>%
  group_by(vore) %>%
  mutate(column = c("min", "max"))


#   vore    sleep_total_range column
#   <chr>               <dbl> <chr> 
# 1 NA                    5.4 min   
# 2 NA                   13.7 max   
# 3 carni                 2.7 min   
# 4 carni                19.4 max   
# 5 herbi                 1.9 min   
# 6 herbi                16.6 max   
# 7 insecti               8.4 min   
# 8 insecti              19.9 max   
# 9 omni                  8   min   
#10 omni                 18   max   

答案 1 :(得分:3)

或在嵌套之前添加第二个列表:

msleep %>%
  group_by(vore) %>%
  summarise(
    sleep_total_range = list(range(sleep_total))
  ) %>% 
  mutate(column = list(c("min", "max"))) %>%
  unnest()

答案 2 :(得分:2)

如果我们创建tibble

,我们可以将其分为两列
library(tidyverse)
msleep %>% 
    group_by(vore) %>% 
    summarise(sleep_total_range = list(setNames(as.list(range(sleep_total)), 
         c("min", "max")) %>% as_tibble)) %>% 
   unnest

答案 3 :(得分:1)

在这种情况下,虽然使c("min","max")突变是更好的选择,但如果要避免这种情况,可以执行以下操作:

library(tidyverse)

msleep %>%
  group_by(vore) %>%
  summarise(sleep_total_range = list(c(min=min(sleep_total), 
                                       max=max(sleep_total)))) %>% 
  mutate(sleep_total_range = map(sleep_total_range, 
                                 ~data.frame(sleep_total_range=.x, dcol=names(.x)))) %>% 
  unnest()

#> # A tibble: 10 x 3
#>    vore    sleep_total_range dcol 
#>    <chr>               <dbl> <fct>
#>  1 <NA>                  5.4 min  
#>  2 <NA>                 13.7 max  
#>  3 carni                 2.7 min  
#>  4 carni                19.4 max  
#>  5 herbi                 1.9 min  
#>  6 herbi                16.6 max  
#>  7 insecti               8.4 min  
#>  8 insecti              19.9 max  
#>  9 omni                  8   min  
#> 10 omni                 18   max

答案 4 :(得分:1)

受@akrun的启发,您还可以在这里采取一些非常规的双重嵌套方法:

msleep %>% 
  group_by(vore) %>% 
  summarise(
    sleep_total_range = list(as.list(range(sleep_total)) %>% set_names(c("min", "max")) %>% enframe)
  ) %>%
  unnest() %>%
  unnest()

# A tibble: 10 x 3
   vore    name  value
   <chr>   <chr> <dbl>
 1 carni   min     2.7
 2 carni   max    19.4
 3 herbi   min     1.9
 4 herbi   max    16.6
 5 insecti min     8.4
 6 insecti max    19.9
 7 omni    min     8  
 8 omni    max    18  
 9 NA      min     5.4
10 NA      max    13.7