如何使用purrr以编程方式分类和/或打印看门人Tabyl输出

时间:2018-11-09 15:51:17

标签: r tidyverse purrr janitor

假设您正在使用tidyverse来嵌套()选择一组分类变量:

library(tidyverse)
library(janitor)

nested_df <- mpg %>%
  select(manufacturer, class) %>%
  gather(variable, value) %>%
  group_by(variable) %>%
  nest()

nested_df
# A tibble: 2 x 2
  variable     data              
  <chr>        <list>            
1 manufacturer <tibble [234 x 1]>
2 class        <tibble [234 x 1]>

现在,我们可以添加一个新列,其中包含janitor::tabyl的输出:

nested_df %>%
  mutate(
    table_output = map(data, ~ tabyl(.$value))
  )

# A tibble: 2 x 3
  variable     data               table_output    
  <chr>        <list>             <list>          
1 manufacturer <tibble [234 x 1]> <tabyl [15 x 3]>
2 class        <tibble [234 x 1]> <tabyl [7 x 3]> 

问题:

  1. 我们如何打印或遍历输出以获取variable名称和table_output
  2. 是否有更好的方法(例如,使用split代替group_by %>% nest

有点像打印以下内容...

Variable is: manufacturer

Tabyl Output:

    .$value  n    percent
       audi 18 0.07692308
  chevrolet 19 0.08119658
      dodge 37 0.15811966
       ford 25 0.10683761
 ...more rows...
    mercury  4 0.01709402
     nissan 13 0.05555556
    pontiac  5 0.02136752
     subaru 14 0.05982906
     toyota 34 0.14529915
 volkswagen 27 0.11538462


Variable is: class

Tabyl Output:

    .$value  n    percent
    2seater  5 0.02136752
    compact 47 0.20085470
    midsize 41 0.17521368
    minivan 11 0.04700855
     pickup 33 0.14102564
 subcompact 35 0.14957265
        suv 62 0.26495726

1 个答案:

答案 0 :(得分:2)

我们可以使用pwalkcatprintpwalk的输入是一个data.frame(列表列表),仅包含variabletable_output列。与pmap类似,pwalk同时遍历两列的每个元素,并在匿名函数中被.x.y引用。与pmap不同的是,pwalk执行代码而不返回任何输出。当我们只希望代码执行的副作用时,这很有用:

library(tidyverse)
library(janitor)

nested_df <- mpg %>%
  select(manufacturer, class) %>%
  gather(variable, value) %>%
  group_by(variable) %>%
  nest()

nested_df %>%
  mutate(
    table_output = map(data, ~ tabyl(.$value))
  ) %>%
  select(-data) %>%
  pwalk(~{
    cat(paste0('Variable is: ', .x, '\n\nTabyl Output: \n\n')) 
    print(.y)
    cat('\n\n')
  })

要打印字符串,我们使用cat来避免前面的[1]。要打印表输出,我们使用print"\n"被添加到填充空白行以提高可读性。

输出:

Variable is: manufacturer

Tabyl Output: 

    .$value  n    percent
       audi 18 0.07692308
  chevrolet 19 0.08119658
      dodge 37 0.15811966
       ford 25 0.10683761
      honda  9 0.03846154
    hyundai 14 0.05982906
       jeep  8 0.03418803
 land rover  4 0.01709402
    lincoln  3 0.01282051
    mercury  4 0.01709402
     nissan 13 0.05555556
    pontiac  5 0.02136752
     subaru 14 0.05982906
     toyota 34 0.14529915
 volkswagen 27 0.11538462


Variable is: class

Tabyl Output: 

    .$value  n    percent
    2seater  5 0.02136752
    compact 47 0.20085470
    midsize 41 0.17521368
    minivan 11 0.04700855
     pickup 33 0.14102564
 subcompact 35 0.14957265
        suv 62 0.26495726