在R中的因子类型数据中替换NA

时间:2016-10-30 16:44:15

标签: r

Data Frame X

数据框X看起来像这样

State      code
New Jersey  1
New York    2
Califronia  NA

所有列都是因素。我希望将NA替换为文本或0.以便我可以稍后转置它们。

当我尝试运行此命令时

X[is.na(X)] <- "0"

我收到以下错误

Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
4: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated

NA值没有变化。

4 个答案:

答案 0 :(得分:3)

另一种使用内置factor的替代方案:

df <- data.frame(a=letters[1:3], b=c("d", "e", NA))
df
  a    b
1 a    d
2 b    e
3 c <NA>

现在,使用factor重新编码因子:

df$b <- factor(df$b, exclude = NULL, 
               levels = c("d", "e", NA), 
               labels = c("d", "e", "f"))
df
  a b
1 a d
2 b e
3 c f

对于许多因素,以下内容可能有用:

df[] <- lapply(df, function(x){
  # check if you have a factor first:
  if(!is.factor(x)) return(x)
  # otherwise include NAs into factor levels and change factor levels:
  x <- factor(x, exclude=NULL)
  levels(x)[is.na(levels(x))] <- "0"
  return(x)
  })

答案 1 :(得分:0)

简单地:

X$code <- as.character(X$code) #as.numeric works just as good
X[is.na(X)] <- "0"
X$code <- as.factor(as.numeric(X$code))

在所有列的循环中,它看起来像这样:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "0"
  X[,i] <- as.factor(as.numeric(X[,i]))
}

对于像这样的字符值:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
  X[,i] <- as.factor(X[,i])
}

或者,如果您不想先转换为字符,请为每列指定一个新级别:

for (i in 2:ncol(X)) {
  levels(X[,i]) <- c(levels(X[,i]), "Not Assigned")
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
}

答案 2 :(得分:0)

如果您不介意来回转换,您编写的代码将适用于矩阵。

> X
       State code code2
1  NewJersey    1    NA
2    NewYork    2     0
3 Califronia   NA     4

> X<-as.matrix(X)
> X[is.na(X)] <- "0"
> X<-as.data.frame(X)
> X
       State code code2
1  NewJersey    1     0
2    NewYork    2     0
3 Califronia    0     4

> str(X)
'data.frame':   3 obs. of  3 variables:
 $ State: Factor w/ 3 levels "Califronia","NewJersey",..: 2 3 1
 $ code : Factor w/ 3 levels " 1"," 2","0": 1 2 3
 $ code2: Factor w/ 3 levels " 0"," 4","0": 3 1 2

答案 3 :(得分:0)

让我们创建一个具有因子水平的随机df

df <- data.frame(a=sample(0:10, size=10, replace=TRUE),
                 b=sample(20:30, size=10, replace=TRUE))
df[df$a==0,'a'] <- NA
df$a <- as.factor(df$a)

其他方法是:

#check levels
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10"

#add new factor level. i.e 88 in our example
df$a = factor(df$a, levels=c(levels(df$a), 88))

#convert all NA's to 88
df$a[is.na(df$a)] = 88

#check levels again
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10" "88"