数据框X看起来像这样
State code
New Jersey 1
New York 2
Califronia NA
所有列都是因素。我希望将NA
替换为文本或0.以便我可以稍后转置它们。
当我尝试运行此命令时
X[is.na(X)] <- "0"
我收到以下错误
Warning messages: 1: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 3: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 4: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated
NA
值没有变化。
答案 0 :(得分:3)
另一种使用内置factor
的替代方案:
df <- data.frame(a=letters[1:3], b=c("d", "e", NA))
df
a b
1 a d
2 b e
3 c <NA>
现在,使用factor
重新编码因子:
df$b <- factor(df$b, exclude = NULL,
levels = c("d", "e", NA),
labels = c("d", "e", "f"))
df
a b
1 a d
2 b e
3 c f
对于许多因素,以下内容可能有用:
df[] <- lapply(df, function(x){
# check if you have a factor first:
if(!is.factor(x)) return(x)
# otherwise include NAs into factor levels and change factor levels:
x <- factor(x, exclude=NULL)
levels(x)[is.na(levels(x))] <- "0"
return(x)
})
答案 1 :(得分:0)
简单地:
X$code <- as.character(X$code) #as.numeric works just as good
X[is.na(X)] <- "0"
X$code <- as.factor(as.numeric(X$code))
在所有列的循环中,它看起来像这样:
for (i in 2:ncol(X)) {
X[,i] <- as.character(X[,i])
X[which(is.na(X[,i])==TRUE),i] <- "0"
X[,i] <- as.factor(as.numeric(X[,i]))
}
对于像这样的字符值:
for (i in 2:ncol(X)) {
X[,i] <- as.character(X[,i])
X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
X[,i] <- as.factor(X[,i])
}
或者,如果您不想先转换为字符,请为每列指定一个新级别:
for (i in 2:ncol(X)) {
levels(X[,i]) <- c(levels(X[,i]), "Not Assigned")
X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
}
答案 2 :(得分:0)
如果您不介意来回转换,您编写的代码将适用于矩阵。
> X
State code code2
1 NewJersey 1 NA
2 NewYork 2 0
3 Califronia NA 4
> X<-as.matrix(X)
> X[is.na(X)] <- "0"
> X<-as.data.frame(X)
> X
State code code2
1 NewJersey 1 0
2 NewYork 2 0
3 Califronia 0 4
> str(X)
'data.frame': 3 obs. of 3 variables:
$ State: Factor w/ 3 levels "Califronia","NewJersey",..: 2 3 1
$ code : Factor w/ 3 levels " 1"," 2","0": 1 2 3
$ code2: Factor w/ 3 levels " 0"," 4","0": 3 1 2
答案 3 :(得分:0)
让我们创建一个具有因子水平的随机df
df <- data.frame(a=sample(0:10, size=10, replace=TRUE),
b=sample(20:30, size=10, replace=TRUE))
df[df$a==0,'a'] <- NA
df$a <- as.factor(df$a)
其他方法是:
#check levels
levels(df$a)
#[1] "3" "4" "7" "9" "10"
#add new factor level. i.e 88 in our example
df$a = factor(df$a, levels=c(levels(df$a), 88))
#convert all NA's to 88
df$a[is.na(df$a)] = 88
#check levels again
levels(df$a)
#[1] "3" "4" "7" "9" "10" "88"