我对这段代码应该如何运作感到困惑:
foo <- factor(c("a", "b", "a", "c", "a", "a", "c", "c"))
#[1] a b a c a a c c
#Levels: a b c
factor(foo, exclude = "a")
#[1] a b a c a a c c
#Levels: a b c
警告讯息:
在as.vector中(exclude,typeof(x)):由强制引入的NA
不应该显示所有a
替换为NA
的因素吗?如果没有,如何实现这个目标?
答案 0 :(得分:3)
自R-3.4.0以来,此错误已得到修复。以下答案现在仅作为历史参考。
正如我在评论中所说,目前exclude
仅适用于
factor(as.character(foo), exclude = "a")
而不是
factor(foo, exclude = "a")
请注意,R 3.3.1下的文档?factor
根本不令人满意:
exclude: a vector of values to be excluded when forming the set of
levels. This should be of the same type as ‘x’, and will be
coerced if necessary.
以下内容未给出任何警告或错误,但也没有做任何事情:
## foo is a factor with `typeof` being "integer"
factor(foo, exclude = 1L)
factor(foo, exclude = factor("a", levels = levels(foo)))
#[1] a b a c a a c c
#Levels: a b c
实际上,文档看起来很矛盾,因为它也是:
The encoding of the vector happens as follows. First all the
values in ‘exclude’ are removed from ‘levels’.
所以看起来开发人员真的希望exclude
成为“角色”。
这更可能是factor
内的错误。问题很明显,当输入向量factor(x, ...)
属于“因子”类时,x
内的跟随行变得混乱:
exclude <- as.vector(exclude, typeof(x))
就像在那种情况下typeof(x)
是“整数”。如果exclude
是字符串,则在尝试将字符串转换为整数时将生成NA
。
我真的不知道为什么factor
里面有这样一条线。如果这一行不存在,后续两行正在做正确的事情:
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
因此,补救/修复只是消除了这一行:
my_factor <- function (x = character(), levels, labels = levels, exclude = NA,
ordered = is.ordered(x), nmax = NA)
{
if (is.null(x))
x <- character()
nx <- names(x)
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y)
y <- as.character(y)
levels <- unique(y[ind])
}
force(ordered)
#exclude <- as.vector(exclude, typeof(x))
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
f <- match(x, levels)
if (!is.null(nx))
names(f) <- nx
nl <- length(labels)
nL <- length(levels)
if (!any(nl == c(1L, nL)))
stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
nl, nL), domain = NA)
levels(f) <- if (nl == nL)
as.character(labels)
else paste0(labels, seq_along(levels))
class(f) <- c(if (ordered) "ordered", "factor")
f
}
我们现在进行测试:
my_factor(foo, exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c
my_factor(as.character(foo), exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c