使用“无”更换NA因子级别的多个列

时间:2017-07-03 21:16:50

标签: r function dataframe na levels

我正在使用数据集房价:高级回归技术,其中包括多个因子变量,这些变量在其级别中具有NA。考虑PoolQL,Alley和MiscFeatures列。我想在一个函数中用NA替换所有这些None,但我没有这样做。到目前为止试过这个:

MissingLevels <- function(x){
  for(i in names(x)){
  levels <- levels(x[i])
  levels[length(levels) + 1] <- 'None'
  x[i] <- factor(x[i], levels = levels)
  x[i][is.na(x[i])] <- 'None'
  return(x)
  }
}

MissingLevels(df[,c('Alley', 'Fence')])

apply(df[,c('Alley', 'Fence')], 2, MissingLevels)

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

1 个答案:

答案 0 :(得分:2)

有几种方法,例如:

env_vars

选项1:使用x <- data.frame(another = 1:3, Alley = c("A", "B", NA), Fence = c("C", NA, NA))

forcats

选项2:

x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], fct_explicit_na, na_level = "None")

  another Alley Fence
1       1     A     C
2       2     B  None
3       3  None  None

PS:第二个答案的灵感来自@G。格洛腾迪克发表replace <NA> in a factor column in R