R Conditional Column Type Change

时间:2018-03-22 23:51:14

标签: r

I have a dataset "train" that consists of 59 columns. I'm trying to change the column types conditionally, based on a ending substring of the column name. I first define the function, then use apply. The result I get is that all the variables are erased, and countless warnings of the following are shown:

In if (stri_sub(var, -3, -1) == "cat") { ... : the condition has length > 1 and only the first element will be used

I can't figure out what's wrong with the function, but I'm guessing that's where the problem is since the apply line was given as an approach in another question. What am I doing wrong?

name_change <- function(var){
if (stri_sub(var, -3,-1) == "cat")
{train[,var] <- as.factor(train[,var])}
#else do nothing
}

train[,names(train)] = apply(train[,names(train)], 2,name_change)

2 个答案:

答案 0 :(得分:1)

Can easily be done with dplyr. Here's how to change the Sepal.Length and Petal.Length columns from dbl into int.

library(dplyr)

data(iris)

glimpse(iris, 60)

Observations: 150
Variables: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
$ Species      <fct> setosa, setosa, setosa, setosa, se...

iris %<>% 
    mutate_at(vars(ends_with("Length")), as.integer) %>% 
    glimpse(60)

Observations: 150
Variables: 5
$ Sepal.Length <int> 5, 4, 4, 4, 5, 5, 4, 5, 4, 4, 5, 4...
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
$ Petal.Length <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
$ Species      <fct> setosa, setosa, setosa, setosa, se...

答案 1 :(得分:1)

以下是对您的方法的更正,100%基础R

name_change <- function(var,col){
  suffix <- "Length"
  if (substr(var,nchar(var)-nchar(suffix)+1,nchar(var)) == suffix)
    {col <- as.integer(col)}
  col
}

iris2 <- data.frame(Map(name_change,names(iris),iris),stringsAsFactors = F)
str(iris2)
# 'data.frame': 150 obs. of  5 variables:
#   $ Sepal.Length: int  5 4 4 4 5 5 4 5 4 4 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: int  1 1 1 1 1 1 1 1 1 1 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

特别是:

  • apply上不使用data.frames,保证金= 2,将它们转换为matrix并且效率不高。
  • 您可以使用lapply循环data.frame列,但此处您需要循环names,因此您需要Map
  • 地图会返回一个列表,因此我会将其转换回data.frame
  • 请注意我明确地从name_change函数返回修改或不修改的列。