I have a dataset "train" that consists of 59 columns. I'm trying to change the column types conditionally, based on a ending substring of the column name. I first define the function, then use apply. The result I get is that all the variables are erased, and countless warnings of the following are shown:
In if (stri_sub(var, -3, -1) == "cat") { ... : the condition has length > 1 and only the first element will be used
I can't figure out what's wrong with the function, but I'm guessing that's where the problem is since the apply line was given as an approach in another question. What am I doing wrong?
name_change <- function(var){
if (stri_sub(var, -3,-1) == "cat")
{train[,var] <- as.factor(train[,var])}
#else do nothing
}
train[,names(train)] = apply(train[,names(train)], 2,name_change)
答案 0 :(得分:1)
Can easily be done with dplyr
. Here's how to change the Sepal.Length
and Petal.Length
columns from dbl
into int
.
library(dplyr)
data(iris)
glimpse(iris, 60)
Observations: 150
Variables: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
$ Species <fct> setosa, setosa, setosa, setosa, se...
iris %<>%
mutate_at(vars(ends_with("Length")), as.integer) %>%
glimpse(60)
Observations: 150
Variables: 5
$ Sepal.Length <int> 5, 4, 4, 4, 5, 5, 4, 5, 4, 4, 5, 4...
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
$ Petal.Length <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
$ Species <fct> setosa, setosa, setosa, setosa, se...
答案 1 :(得分:1)
以下是对您的方法的更正,100%基础R
。
name_change <- function(var,col){
suffix <- "Length"
if (substr(var,nchar(var)-nchar(suffix)+1,nchar(var)) == suffix)
{col <- as.integer(col)}
col
}
iris2 <- data.frame(Map(name_change,names(iris),iris),stringsAsFactors = F)
str(iris2)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: int 5 4 4 4 5 5 4 5 4 4 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: int 1 1 1 1 1 1 1 1 1 1 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
特别是:
apply
上不使用data.frames
,保证金= 2,将它们转换为matrix
并且效率不高。lapply
循环data.frame
列,但此处您需要循环names
,因此您需要Map
。data.frame
name_change
函数返回修改或不修改的列。