我构建了包含NA的因子向量。
my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA
我改变其中一个级别。
levels(my_vec)[levels(my_vec) == "b"] <- "c"
NA消失了。
levels(my_vec)
# [1] "a" "c"
我该如何保留它?
修改
@rawr提供了一个很好的解决方案,可以在大多数时间工作,它适用于我之前的具体示例,但不适用于我将在下面显示的那个 @ Hack-R使用addNA有一个实用的选项,我可以使用它,但我相当完全通用的解决方案
查看此广义问题
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c" # NA disppeared
@ rawr的解决方案:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA "c" "c" # c is duplicated
@ Hack-R的解决方案:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA # NA is in the end
我想要levels(my_vec) == c("a",NA,"c")
答案 0 :(得分:0)
你必须引用 NA ,否则R将其视为空值而不是因子级别。因子级别默认按字母顺序排序,但显然并不总是有用,因此您可以通过将新的列表顺序传递给levels()
来指定不同的顺序
require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))
#now reorder levels
my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])
Levels: a NA c
答案 1 :(得分:-1)
我终于创建了一个函数,它首先用临时值替换NA
值(受@lmo启发),然后替换我想要的标准方式,然后将NA
放回原位使用@ rawr的建议。
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a c c
# Levels: a <NA> c
由于奖励level_sub
可以与na_rep = NULL
一起使用,这将删除NA
,并且它在管道中看起来很好:)。
level_sub <- function(x,from,to,na_rep = "NA"){
if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
levels(x)[levels(x) %in% from] <- to
if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
x
}
然而,似乎R真的不希望你将NA添加到因子中。
levels(my_vec) <- c(NA,"a")
会有一种奇怪的行为,但这并不止于此。虽然subset
会在您的列中保持NA
级别,但rbind
会悄然删除它们!如果进一步的调查显示半R函数删除了NA
因子,使得它们非常不安全,我不会感到惊讶......