如何在更改等级时保持NA

时间:2017-07-20 13:42:35

标签: r na factors levels

我构建了包含NA的因子向量。

my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA 

我改变其中一个级别。

levels(my_vec)[levels(my_vec) == "b"] <- "c"

NA消失了。

levels(my_vec)
# [1] "a" "c"

我该如何保留它?

修改

@rawr提供了一个很好的解决方案,可以在大多数时间工作,它适用于我之前的具体示例,但不适用于我将在下面显示的那个 @ Hack-R使用addNA有一个实用的选项,我可以使用它,但我相当完全通用的解决方案

查看此广义问题

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c"      # NA disppeared

@ rawr的解决方案:

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA  "c" "c" # c is duplicated

@ Hack-R的解决方案:

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA     # NA is in the end

我想要levels(my_vec) == c("a",NA,"c")

2 个答案:

答案 0 :(得分:0)

你必须引用 NA ,否则R将其视为空值而不是因子级别。因子级别默认按字母顺序排序,但显然并不总是有用,因此您可以通过将新的列表顺序传递给levels()来指定不同的顺序

require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))

#now reorder levels

my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])

Levels: a NA c

答案 1 :(得分:-1)

我终于创建了一个函数,它首先用临时值替换NA值(受@lmo启发),然后替换我想要的标准方式,然后将NA放回原位使用@ rawr的建议。

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a    c    c   
# Levels: a <NA> c

由于奖励level_sub可以与na_rep = NULL一起使用,这将删除NA,并且它在管道中看起来很好:)。

level_sub <- function(x,from,to,na_rep = "NA"){
  if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
  levels(x)[levels(x) %in% from] <- to
  if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
  x
}

然而,似乎R真的不希望你将NA添加到因子中。

levels(my_vec) <- c(NA,"a")会有一种奇怪的行为,但这并不止于此。虽然subset会在您的列中保持NA级别,但rbind会悄然删除它们!如果进一步的调查显示半R函数删除了NA因子,使得它们非常不安全,我不会感到惊讶......