在dplyr 0.7.0

时间:2017-06-15 19:15:37

标签: r dplyr

我正在尝试使用dplyr::select_if来计算如何有效地选择列。 dplyr 0.70中的starwars数据集是用于此的良好数据集:

> starwars
# A tibble: 87 x 13
                 name height  mass    hair_color  skin_color eye_color birth_year gender homeworld species     films  vehicles starships
                <chr>  <int> <dbl>         <chr>       <chr>     <chr>      <dbl>  <chr>     <chr>   <chr>    <list>    <list>    <list>
 1     Luke Skywalker    172    77         blond        fair      blue       19.0   male  Tatooine   Human <chr [5]> <chr [2]> <chr [2]>
 2              C-3PO    167    75          <NA>        gold    yellow      112.0   <NA>  Tatooine   Droid <chr [6]> <chr [0]> <chr [0]>
 3              R2-D2     96    32          <NA> white, blue       red       33.0   <NA>     Naboo   Droid <chr [7]> <chr [0]> <chr [0]>
 4        Darth Vader    202   136          none       white    yellow       41.9   male  Tatooine   Human <chr [4]> <chr [0]> <chr [1]>
 5        Leia Organa    150    49         brown       light     brown       19.0 female  Alderaan   Human <chr [5]> <chr [1]> <chr [0]>
 6          Owen Lars    178   120   brown, grey       light      blue       52.0   male  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 7 Beru Whitesun lars    165    75         brown       light      blue       47.0 female  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 8              R5-D4     97    32          <NA>  white, red       red         NA   <NA>  Tatooine   Droid <chr [1]> <chr [0]> <chr [0]>
 9  Biggs Darklighter    183    84         black       light     brown       24.0   male  Tatooine   Human <chr [1]> <chr [0]> <chr [1]>
10     Obi-Wan Kenobi    182    77 auburn, white        fair blue-gray       57.0   male   Stewjon   Human <chr [6]> <chr [1]> <chr [5]>

现在说我想选择只有整数的列。这很有效:

library(dplyr)

starwars %>%
  select_if(is.numeric)

但是,如果我想根据多个标准进行选择,该怎么办?例如,我可能想要数字和字符列:

starwars %>%
  select_if(c(is.numeric, is.character))

或者我想要所有数字和name列:

starwars %>%
  select_if(name, is.character)

以上两个例子都没有用,所以我想知道如何完成我在这里概述的内容。

5 个答案:

答案 0 :(得分:5)

对于第一个例子:

starwars %>%
  select_if(function(col) {is.numeric(col) | is.character(col)})

这是直接从RDocumentation页面获取的。

第二个:

toKeep <- sapply(starwars, is.numeric)
starwars %>%
  select("name", names(toKeep)[as.numeric(toKeep) == 1])

我目前无法做出更漂亮的东西,但我确信有更好的方法:)

答案 1 :(得分:3)

news所述,从1.0.0版开始,

select()和rename()使用最新版本的tidyselect接口。实际上,这意味着您现在可以使用布尔逻辑(即!,&和|)组合选择,并使用谓词函数(例如is.character)按类型选择变量(#4680)。

### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  as_tibble() %>% 
  glimpse()
#> Rows: 87
#> Columns: 14
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
#> $ films      <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
#> $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
#> $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...

要选择数字或字符列:

starwars %>%
  select(is.numeric | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

或选择非列表列

starwars %>%
  select(!is.list) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

要选择name和字符列

starwars %>%
  select(name | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 8
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

reprex package(v0.3.0)于2020-02-17创建

答案 2 :(得分:2)

您可以编写自己的函数:

 to_keep <- function(x) is.numeric(x) | is.character(x)
 starwars %>% select_if(to_keep)

或者您可以使用“quosure-style lambda functions”:

starwars %>% select_if(funs(is.numeric(.) | is.character(.)))

我不知道为列选择组合不同逻辑的好方法,所以我使用混合方法(即使它不是很优雅,因为你必须重复初始数据集):

 starwars %>%
    select("name") %>%
    bind_cols(select_if(starwars, funs(is.numeric(.) | is.character(.))))

答案 3 :(得分:2)

使用~函数时,select_if代表匿名函数的优雅tidyverse语法可能会有所帮助:

require(tidyverse)

# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.)) 

# all numeric AND the name column
starwars %>% select(name, where(is.numeric))

谓词功能,例如tidyverse的创建者建议,出于某种原因,is.numeric内的select应该包裹在where()中。

答案 4 :(得分:0)

第二部分(获取数字AND名称列):

to_keep <- c(starwars %>% select_if(is.numeric) %>% names,"name")
starwars %>% select(one_of(to_keep))