请考虑以下代码:
foo <- function() {
if (runif(1) < 0.5) {
return(data.frame(result="low"))
} else {
return(data.frame(result="high"))
}
}
df = data.frame(val=c(1,2,3,4,5,6))
df %>% group_by(val) %>% do(foo())
它是随机的,但如果有两个&#34;低&#34;和&#34;高&#34;结果返回后,您会看到如下错误:
Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
2: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
3: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
4: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
5: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
我相信返回的第一个值(例如,&#34; low&#34;)会转换为一个级别的因子,当另一个级别出现时,它会引发dplyr的愤怒。
对此示例进行编码以避免警告的正确方法是什么?
修改:一个解决方案就是:
foo <- function() {
if (runif(1) < 0.5) {
return(data.frame(result=factor("low", levels=c("low", "high"))))
} else {
return(data.frame(result=factor("high", levels=c("low", "high"))))
}
}
但如果我不提前知道因子水平怎么办?
另外,从根本上说,我想要返回一个字符向量,而不是一个因素。
答案 0 :(得分:5)
或者:
stringsAsFactors=FALSE
:return(data.frame(..., stringsAsFactors=FALSE))
或者:
data_frame
:return(data_frame(...))
有关因子处理的更多信息,请参阅?data.frame。