R tidyverse方法根据数据类型顺序选择列

时间:2018-06-19 20:04:28

标签: r dataframe dplyr

 library(tidyverse)
 #> Warning: package 'tidyverse' was built under R version 3.4.4
 #> Warning: package 'forcats' was built under R version 3.4.4

 example <- tibble(
     num1 = sample(1:100, 10),
     categ1 = as.factor(c(sample(letters, 10))),
     num2 = sample(1:100, 10),
     categ2 = as.factor(c(sample(letters, 10)))
 )

 head(example)
 #> # A tibble: 6 x 4
 #>    num1 categ1  num2 categ2
 #>   <int> <fct>  <int> <fct> 
 #> 1     4 c          5 l     
 #> 2    86 u         64 b     
 #> 3    38 z         18 r     
 #> 4    95 e         44 j     
 #> 5    77 w         35 u     
 #> 6    84 y         14 i

reprex package(v0.2.0)于2018-06-19创建。

以上示例显示了具有整数和因子数据类型列的基本数据框。在这个小示例中,很容易使用 dplyr 中的select(example, categ1, categ2, num1, num2)来手动选择您希望列显示的顺序。

但是,假设您有许多列是数据类型的混合,并且希望首先选择所有因素,然后再选择其他所有因素(或基于数据类型的任何特定顺序)?

手动键入每个列的名称或使用select()之类的contains()助手会很快变得乏味,因为列数不胜数。我更喜欢 tidyverse 解决方案,但也想知道如何在基本R中实现这一目标。

2 个答案:

答案 0 :(得分:2)

具有3类列的示例数据

library(tidyverse)
example <- tibble(
     num1 = as.character(sample(1:100, 10)),
     categ1 = as.factor(c(sample(letters, 10))),
     num2 = sample(1:100, 10),
     categ2 = as.factor(c(sample(letters, 10)))
 )

假设您要按此顺序对列进行排序

my.order <- c('factor', 'integer', 'character')

即因素,然后是整数,然后是字符

你可以

example %>% 
  select(sapply(., class) %>% .[order(match(., my.order))] %>% names)

# # A tibble: 10 x 4
#    categ1 categ2  num2 num1 
#    <fct>  <fct>  <int> <chr>
#  1 y      e         94 46   
#  2 t      b         52 31   
#  3 w      c         32 57   
#  4 k      i         27 89   
#  5 n      d         76 14   
#  6 x      g         67 40   
#  7 c      v         16 20   
#  8 e      z          6 95   
#  9 i      t         70 13   
# 10 g      w         57 42 

作为功能(相同的输出)

order_cols <- function(df, col.order){
  df %>% 
    select(sapply(., class) %>% .[order(match(., col.order))] %>% names)
}

example %>% 
  order_cols(c('factor', 'integer', 'character'))

答案 1 :(得分:1)

发布我发现的 tidyverse 方法。

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.4
#> Warning: package 'forcats' was built under R version 3.4.4

example <- tibble(
  num1 = sample(1:100, 10),
  categ1 = as.factor(c(sample(letters, 10))),
  num2 = sample(1:100, 10),
  categ2 = as.factor(c(sample(letters, 10)))
  )

head(example)
#> # A tibble: 6 x 4
#>    num1 categ1  num2 categ2
#>   <int> <fct>  <int> <fct> 
#> 1    33 h         94 s     
#> 2    78 x          6 k     
#> 3    82 s         84 i     
#> 4    11 k         20 o     
#> 5    51 v         11 q     
#> 6     5 w         51 b

# Use select_if() to specify data-type and pull names to insert into outter select()
# Intersect is only needed if you previously filtered 
# some columns and you do not want those factors (in this case) to creep back in 
# with the select_if() call 
example_arranged <- example %>%
  select(intersect(names(select_if(., is.factor)), names(.)), everything())

head(example_arranged)
#> # A tibble: 6 x 4
#>   categ1 categ2  num1  num2
#>   <fct>  <fct>  <int> <int>
#> 1 h      s         33    94
#> 2 x      k         78     6
#> 3 s      i         82    84
#> 4 k      o         11    20
#> 5 v      q         51    11
#> 6 w      b          5    51

reprex package(v0.2.0)于2018-06-19创建。