我正在尝试使用dplyr::select_if
来计算如何有效地选择列。 dplyr 0.70中的starwars
数据集是用于此的良好数据集:
> starwars
# A tibble: 87 x 13
name height mass hair_color skin_color eye_color birth_year gender homeworld species films vehicles starships
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <list> <list> <list>
1 Luke Skywalker 172 77 blond fair blue 19.0 male Tatooine Human <chr [5]> <chr [2]> <chr [2]>
2 C-3PO 167 75 <NA> gold yellow 112.0 <NA> Tatooine Droid <chr [6]> <chr [0]> <chr [0]>
3 R2-D2 96 32 <NA> white, blue red 33.0 <NA> Naboo Droid <chr [7]> <chr [0]> <chr [0]>
4 Darth Vader 202 136 none white yellow 41.9 male Tatooine Human <chr [4]> <chr [0]> <chr [1]>
5 Leia Organa 150 49 brown light brown 19.0 female Alderaan Human <chr [5]> <chr [1]> <chr [0]>
6 Owen Lars 178 120 brown, grey light blue 52.0 male Tatooine Human <chr [3]> <chr [0]> <chr [0]>
7 Beru Whitesun lars 165 75 brown light blue 47.0 female Tatooine Human <chr [3]> <chr [0]> <chr [0]>
8 R5-D4 97 32 <NA> white, red red NA <NA> Tatooine Droid <chr [1]> <chr [0]> <chr [0]>
9 Biggs Darklighter 183 84 black light brown 24.0 male Tatooine Human <chr [1]> <chr [0]> <chr [1]>
10 Obi-Wan Kenobi 182 77 auburn, white fair blue-gray 57.0 male Stewjon Human <chr [6]> <chr [1]> <chr [5]>
现在说我想选择只有整数的列。这很有效:
library(dplyr)
starwars %>%
select_if(is.numeric)
但是,如果我想根据多个标准进行选择,该怎么办?例如,我可能想要数字和字符列:
starwars %>%
select_if(c(is.numeric, is.character))
或者我想要所有数字和name
列:
starwars %>%
select_if(name, is.character)
以上两个例子都没有用,所以我想知道如何完成我在这里概述的内容。
答案 0 :(得分:5)
对于第一个例子:
starwars %>%
select_if(function(col) {is.numeric(col) | is.character(col)})
这是直接从RDocumentation页面获取的。
第二个:
toKeep <- sapply(starwars, is.numeric)
starwars %>%
select("name", names(toKeep)[as.numeric(toKeep) == 1])
我目前无法做出更漂亮的东西,但我确信有更好的方法:)
答案 1 :(得分:3)
如news所述,从1.0.0版开始,
select()和rename()使用最新版本的tidyselect接口。实际上,这意味着您现在可以使用布尔逻辑(即!,&和|)组合选择,并使用谓词函数(例如is.character)按类型选择变量(#4680)。
### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)
starwars %>%
as_tibble() %>%
glimpse()
#> Rows: 87
#> Columns: 14
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
#> $ films <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
#> $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
#> $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...
要选择数字或字符列:
starwars %>%
select(is.numeric | is.character) %>%
glimpse()
#> Rows: 87
#> Columns: 11
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
或选择非列表列
starwars %>%
select(!is.list) %>%
glimpse()
#> Rows: 87
#> Columns: 11
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
要选择name
和字符列
starwars %>%
select(name | is.character) %>%
glimpse()
#> Rows: 87
#> Columns: 8
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
由reprex package(v0.3.0)于2020-02-17创建
答案 2 :(得分:2)
您可以编写自己的函数:
to_keep <- function(x) is.numeric(x) | is.character(x)
starwars %>% select_if(to_keep)
或者您可以使用“quosure-style lambda functions”:
starwars %>% select_if(funs(is.numeric(.) | is.character(.)))
我不知道为列选择组合不同逻辑的好方法,所以我使用混合方法(即使它不是很优雅,因为你必须重复初始数据集):
starwars %>%
select("name") %>%
bind_cols(select_if(starwars, funs(is.numeric(.) | is.character(.))))
答案 3 :(得分:2)
使用~
函数时,select_if
代表匿名函数的优雅tidyverse语法可能会有所帮助:
require(tidyverse)
# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.))
# all numeric AND the name column
starwars %>% select(name, where(is.numeric))
谓词功能,例如tidyverse的创建者建议,出于某种原因,is.numeric
内的select
应该包裹在where()
中。
答案 4 :(得分:0)
第二部分(获取数字AND名称列):
to_keep <- c(starwars %>% select_if(is.numeric) %>% names,"name")
starwars %>% select(one_of(to_keep))