在数据框中更改NA-s更多列

时间:2017-03-12 17:31:00

标签: r loops na

我有一个数据框(称为hp),其中包含更多具有NA-s的列。这些列的类是因子。首先,我想将其更改为字符,用“无”填充NA-s并将其更改回因子。我有14列,因此我想用循环制作它。但它不起作用。

感谢您的帮助。

列:

miss_names<-c("Alley","MasVnrType","FireplaceQu","PoolQC","Fence","MiscFeature","GarageFinish",       "GarageQual","GarageCond","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1",
          "BsmtFinType2","Electrical")

循环:

for (i in miss_names){       
    hp[i]<-as.character(hp[i])
    hp[i][is.na(hp[i])]<-"NONE"
    hp[i]<-as.factor(hp[i])
    print(hp[i])
    }

 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list? 

2 个答案:

答案 0 :(得分:1)

使用addNA()NA添加为因子级别,然后将该级别替换为您想要的任何级别。您不必先将因子转换为字符向量。您可以遍历数据框中的所有因子并逐个替换它们。

# Sample data
dd <- data.frame(
  x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
  y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE)
)

# Loop over the columns
for (i in seq_along(dd)) {
  xx <- addNA(dd[, i])
  levels(xx) <- c(levels(dd[, i]), "none")
  dd[, i] <- xx
}

这给了我们

> str(dd)
'data.frame':   20 obs. of  2 variables:
 $ x: Factor w/ 4 levels "a","b","c","none": 1 4 1 4 4 1 4 3 3 3 ...
 $ y: Factor w/ 4 levels "A","B","C","none": 1 1 2 2 1 3 3 3 4 1 ...

答案 1 :(得分:0)

使用与@ Johan Larsson相同的数据使用purrr库的替代解决方案:

library(purrr)

set.seed(15)
dd <- data.frame(
        x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
        y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE))

# Create a function to convert NA to none
convert.to.none <- function(x){
        y <- addNA(x)
        levels(y) <- c(levels(x), "none")
        x <- y
        return(x) }

# use the map function to cycle through dd's columns
map_df(dd, convert.2.none)

允许缩放您的工作。