我使用以下代码(LINK)来清理名为dataframe
的假设df数据的潜在麻烦方面:
dataframe <- fread(
"A B B.x C D E iso year
0 3 NA 1 NA NA NLD 2009
1 4 NA 2 NA NA NLD 2009
0 5 NA 3 NA NA AUS 2011
1 5 NA 4 NA NA AUS 2011
0 0 NA 7 NA NA NLD 2008
1 1 NA 1 NA NA NLD 2008
0 1 NA 3 NA NA AUS 2012
0 NA 1 NA 1 NA ECU 2009
1 NA 0 NA 2 0 ECU 2009
0 NA 0 NA 3 0 BRA 2011
1 NA 0 NA 4 0 BRA 2011
0 NA 1 NA 7 NA ECU 2008
1 NA 0 NA 1 0 ECU 2008
0 NA 0 NA 3 2 BRA 2012
1 NA 0 NA 4 NA BRA 2012",
header = TRUE
)
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
由于我经常使用它,因此我希望将其放在函数中并尝试以下操作:
cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
}
dfclean <- cleanfunction(dataframe)
但是,这创建了一个转换为因子的变量列表,而不是将这些变量转换为因子的数据框。
我该如何解决?
答案 0 :(得分:2)
函数从最后一个求值表达式返回值。在这种情况下,最后确定的表达式是
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
和<-
操作始终只返回右侧值。因此,您只是从lapply
返回结果,而不是从更新后的dataframe
返回结果。
您只需要添加另一行内容
return(dataframe)
或者只是
dataframe
功能结束。