我有一个data.frame,包含近200个变量(列)和不同类型的数据(num,int,logi,factor)。现在,我想删除“factor”类型的所有变量来运行函数cor()
当我使用函数str()时,我可以看到哪些变量属于“factor”类型,但我不知道如何选择和删除所有这些变量,因为逐个删除是很费时间的。为了选择这些变量,我尝试了attr()和typeof()而没有结果。
有些方向?
答案 0 :(得分:8)
假设通用data.frame
,这将删除factor
df[,-which(sapply(df, class) == "factor")]
修改强>
根据@Roland的建议,你也可以保留那些不是factor
的人。无论你喜欢什么。
df[, sapply(df, class) != "factor"]
编辑2
由于您关注cor
函数,@ Ista还指出在特定实例中过滤is.numeric
会更安全。以上内容仅用于删除factor
类型。
df[,sapply(df, is.numeric)]
答案 1 :(得分:1)
这是一个非常有用的tidyverse
解决方案,改编自here:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(tidyverse)
# Create dummy dataset with multiple variable types
df <-
tibble::tribble(
~var_num_1, ~var_num_2, ~var_char, ~var_fct, ~var_date,
1, 10, "this", "THIS", "2019-12-18",
2, 20, "is", "IS", "2019-12-19",
3, 30, "dummy", "DUMMY", "2019-12-20",
4, 40, "character", "FACTOR", "2019-12-21",
5, 50, "text", "TEXT", "2019-12-22"
) %>%
mutate(
var_fct = as_factor(var_fct),
var_date = as_date(var_date)
)
# Select numeric variables
df %>% select_if(is.numeric)
#> # A tibble: 5 x 2
#> var_num_1 var_num_2
#> <dbl> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 30
#> 4 4 40
#> 5 5 50
# Select character variables
df %>% select_if(is.character)
#> # A tibble: 5 x 1
#> var_char
#> <chr>
#> 1 this
#> 2 is
#> 3 dummy
#> 4 character
#> 5 text
# Select factor variables
df %>% select_if(is.factor)
#> # A tibble: 5 x 1
#> var_fct
#> <fct>
#> 1 THIS
#> 2 IS
#> 3 DUMMY
#> 4 FACTOR
#> 5 TEXT
# Select date variables
df %>% select_if(is.Date)
#> # A tibble: 5 x 1
#> var_date
#> <date>
#> 1 2019-12-18
#> 2 2019-12-19
#> 3 2019-12-20
#> 4 2019-12-21
#> 5 2019-12-22
# Select variables using negation (note the use of `~`)
df %>% select_if(~!is.numeric(.))
#> # A tibble: 5 x 3
#> var_char var_fct var_date
#> <chr> <fct> <date>
#> 1 this THIS 2019-12-18
#> 2 is IS 2019-12-19
#> 3 dummy DUMMY 2019-12-20
#> 4 character FACTOR 2019-12-21
#> 5 text TEXT 2019-12-22
由reprex package(v0.3.0)于2019-12-18创建