编写转换变量模式和类的函数

时间:2019-04-17 15:09:09

标签: r function class data-cleaning

我使用以下代码(LINK)来清理名为dataframe的假设df数据的潜在麻烦方面:

dataframe <- fread(
    "A   B  B.x  C  D   E   iso   year   
     0   3   NA  1  NA  NA  NLD   2009   
     1   4   NA  2  NA  NA  NLD   2009   
     0   5   NA  3  NA  NA  AUS   2011   
     1   5   NA  4  NA  NA  AUS   2011   
     0   0   NA  7  NA  NA  NLD   2008   
     1   1   NA  1  NA  NA  NLD   2008   
     0   1   NA  3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   NA  4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   NA  1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

由于我经常使用它,因此我希望将其放在函数中并尝试以下操作:

cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
}

dfclean <- cleanfunction(dataframe)

但是,这创建了一个转换为因子的变量列表,而不是将这些变量转换为因子的数据框。

我该如何解决?

1 个答案:

答案 0 :(得分:2)

函数从最后一个求值表达式返回值。在这种情况下,最后确定的表达式是

dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

<-操作始终只返回右侧值。因此,您只是从lapply返回结果,而不是从更新后的dataframe返回结果。

您只需要添加另一行内容

return(dataframe)

或者只是

dataframe

功能结束。