dplyr join warning:加入不同级别的因子

时间:2015-05-26 20:36:11

标签: r

ContentAreaRenderer包中使用join函数时,我收到此警告:

dplyr

网上没有很多关于此的信息。知道它可能是什么?谢谢!

3 个答案:

答案 0 :(得分:35)

这不是错误,这是一个警告。它告诉你,你在连接中使用的一个列是一个因素,并且该因子在不同的数据集中具有不同的级别。为了不丢失任何信息,将因子转换为字符值。例如:

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

您可以确保两个因素在合并之前具有相同的级别

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"

答案 1 :(得分:4)

如果两个表中的连接列具有不同的级别顺序,也会出现此警告消息;

tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a))
# Change level order of table tb2's col a
tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))

# Check both still factors
tb1$a %>% class()
[1] "factor"
tb2$a %>% class()
[1] "factor"

# Check level order
tb1$a %>% levels()
[1] "a" "b" "c"
tb2$a %>% levels()
[1] "c" "a" "b"

# Try joining
tb1 %>% left_join(tb2)
Joining, by = "a"
Column `a` joining factors with different levels, coercing to character vector

答案 2 :(得分:0)

对于数据库,在许多情况下请不要忘记stringsAsFactors=FALSE,以避免出现此警告。 (这是我的情况)。

sqlExecute(my_database_channel, data=myparam, stringsAsFactors=FALSE )