如何按数据类型从data.frame中删除列?

时间:2015-02-16 18:45:40

标签: r

我有一个data.frame,包含近200个变量(列)和不同类型的数据(num,int,logi,factor)。现在,我想删除“factor”类型的所有变量来运行函数cor()

当我使用函数str()时,我可以看到哪些变量属于“factor”类型,但我不知道如何选择和删除所有这些变量,因为逐个删除是很费时间的。为了选择这些变量,我尝试了attr()和typeof()而没有结果。

有些方向?

2 个答案:

答案 0 :(得分:8)

假设通用data.frame,这将删除factor

类型的列
df[,-which(sapply(df, class) == "factor")]

修改

根据@Roland的建议,你也可以保留那些不是factor的人。无论你喜欢什么。

df[, sapply(df, class) != "factor"]

编辑2

由于您关注cor函数,@ Ista还指出在特定实例中过滤is.numeric会更安全。以上内容仅用于删除factor类型。

df[,sapply(df, is.numeric)]

答案 1 :(得分:1)

这是一个非常有用的tidyverse解决方案,改编自here

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tidyverse)

# Create dummy dataset with multiple variable types
df <- 
  tibble::tribble(
  ~var_num_1, ~var_num_2,   ~var_char, ~var_fct, ~var_date,
           1,         10,      "this",   "THIS", "2019-12-18",
           2,         20,        "is",     "IS", "2019-12-19",
           3,         30,     "dummy",  "DUMMY", "2019-12-20",
           4,         40, "character", "FACTOR", "2019-12-21",
           5,         50,      "text",   "TEXT", "2019-12-22"
  ) %>% 
  mutate(
    var_fct = as_factor(var_fct),
    var_date = as_date(var_date)
  )


# Select numeric variables
df %>% select_if(is.numeric)
#> # A tibble: 5 x 2
#>   var_num_1 var_num_2
#>       <dbl>     <dbl>
#> 1         1        10
#> 2         2        20
#> 3         3        30
#> 4         4        40
#> 5         5        50

# Select character variables
df %>% select_if(is.character)
#> # A tibble: 5 x 1
#>   var_char 
#>   <chr>    
#> 1 this     
#> 2 is       
#> 3 dummy    
#> 4 character
#> 5 text

# Select factor variables
df %>% select_if(is.factor)
#> # A tibble: 5 x 1
#>   var_fct
#>   <fct>  
#> 1 THIS   
#> 2 IS     
#> 3 DUMMY  
#> 4 FACTOR 
#> 5 TEXT

# Select date variables
df %>% select_if(is.Date)
#> # A tibble: 5 x 1
#>   var_date  
#>   <date>    
#> 1 2019-12-18
#> 2 2019-12-19
#> 3 2019-12-20
#> 4 2019-12-21
#> 5 2019-12-22

# Select variables using negation (note the use of `~`)
df %>% select_if(~!is.numeric(.))
#> # A tibble: 5 x 3
#>   var_char  var_fct var_date  
#>   <chr>     <fct>   <date>    
#> 1 this      THIS    2019-12-18
#> 2 is        IS      2019-12-19
#> 3 dummy     DUMMY   2019-12-20
#> 4 character FACTOR  2019-12-21
#> 5 text      TEXT    2019-12-22

reprex package(v0.3.0)于2019-12-18创建