`factor()`中的“exclude”参数不起作用

时间:2016-10-02 12:38:15

标签: r

我对这段代码应该如何运作感到困惑:

foo <- factor(c("a", "b", "a", "c", "a", "a", "c", "c"))
#[1] a b a c a a c c
#Levels: a b c

factor(foo, exclude = "a")
#[1] a b a c a a c c
#Levels: a b c
  

警告讯息:

     

在as.vector中(exclude,typeof(x)):由强制引入的NA

不应该显示所有a替换为NA的因素吗?如果没有,如何实现这个目标?

1 个答案:

答案 0 :(得分:3)

自R-3.4.0以来,此错误已得到修复。以下答案现在仅作为历史参考。

正如我在评论中所说,目前exclude仅适用于

factor(as.character(foo), exclude = "a")

而不是

factor(foo, exclude = "a")

请注意,R 3.3.1下的文档?factor根本不令人满意:

exclude: a vector of values to be excluded when forming the set of
         levels.  This should be of the same type as ‘x’, and will be
         coerced if necessary.

以下内容未给出任何警告或错误,但也没有做任何事情:

## foo is a factor with `typeof` being "integer"
factor(foo, exclude = 1L)
factor(foo, exclude = factor("a", levels = levels(foo)))
#[1] a b a c a a c c
#Levels: a b c

实际上,文档看起来很矛盾,因为它也是:

The encoding of the vector happens as follows.  First all the
values in ‘exclude’ are removed from ‘levels’. 

所以看起来开发人员真的希望exclude成为“角色”。

这更可能是factor内的错误。问题很明显,当输入向量factor(x, ...)属于“因子”类时,x内的跟随行变得混乱:

exclude <- as.vector(exclude, typeof(x))

就像在那种情况下typeof(x)是“整数”。如果exclude是字符串,则在尝试将字符串转换为整数时将生成NA

我真的不知道为什么factor里面有这样一条线。如果这一行不存在,后续两行正在做正确的事情:

    x <- as.character(x)
    levels <- levels[is.na(match(levels, exclude))]

因此,补救/修复只是消除了这一行:

my_factor <- function (x = character(), levels, labels = levels, exclude = NA, 
                       ordered = is.ordered(x), nmax = NA) 
{
    if (is.null(x)) 
        x <- character()
    nx <- names(x)
    if (missing(levels)) {
        y <- unique(x, nmax = nmax)
        ind <- sort.list(y)
        y <- as.character(y)
        levels <- unique(y[ind])
    }
    force(ordered)
    #exclude <- as.vector(exclude, typeof(x))
    x <- as.character(x)
    levels <- levels[is.na(match(levels, exclude))]
    f <- match(x, levels)
    if (!is.null(nx)) 
        names(f) <- nx
    nl <- length(labels)
    nL <- length(levels)
    if (!any(nl == c(1L, nL))) 
        stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
            nl, nL), domain = NA)
    levels(f) <- if (nl == nL) 
        as.character(labels)
    else paste0(labels, seq_along(levels))
    class(f) <- c(if (ordered) "ordered", "factor")
    f
}

我们现在进行测试:

my_factor(foo, exclude = "a")
#[1] <NA> b    <NA> c    <NA> <NA> c    c   
#Levels: b c

my_factor(as.character(foo), exclude = "a")
#[1] <NA> b    <NA> c    <NA> <NA> c    c   
#Levels: b c