使用R中的“car”将变量重新编码为有序因子变量

时间:2014-02-03 18:03:41

标签: r

我正在关注car包中的文档,以便重新编码有序因子变量。

例如,在我的data.frame df中,我有一个代表教育的变量(BG_x)。我尝试将其重新编码为:

df <- data.frame(
    BG_x = sample(1:8)
)
df$education<-recode(df$BG_x,"1:2='High school or less';3='Some college';4='College';5:8='Grad degree'", levels=c('High school or less','Some college','College','Grad degree'))
table(df$education)

但是,当我检查分布时,变量看起来是按字母顺序而不是我在recode命令中指定的顺序。对于出了什么问题的任何想法?

3 个答案:

答案 0 :(得分:2)

这不是使用recode的答案,而是展示了如何使用基础R factor + levels执行此操作:

set.seed(1)
df <- data.frame(BG_x = sample(1:8))
df$education <- factor(df$BG_x, ordered = TRUE)
levels(df$education) <- list("High school or less" = 1:2, 
                             "Some college" = 3, "College" = 4, 
                             "Grad degree" = 5:8)
df
#   BG_x           education
# 1    3        Some college
# 2    8         Grad degree
# 3    4             College
# 4    5         Grad degree
# 5    1 High school or less
# 6    7         Grad degree
# 7    2 High school or less
# 8    6         Grad degree
table(df$education)
# 
# High school or less        Some college             College         Grad degree
#                   2                   1                   1                   4

一段时间以来,我为这些步骤编写了一个便利包装器(将一个级别分配给多个值)和posted it as a Gist

您可以按如下方式使用它:

library(devtools)
source_gist("7019545")
df$education <- Factor(df$BG_x, ordered = TRUE, 
                       levels = list("High school or less" = 1:2, 
                                     "Some college" = 3, "College" = 4, 
                                     "Grad degree" = 5:8))

答案 1 :(得分:1)

因为您的原始变量本身不是一个因素,所以您需要包含:

as.factor.result = TRUE

致电recode

答案 2 :(得分:0)

您是否考虑过使用plyr的mapvalues函数?我认为它比汽车的重新编码更容易实现。

在你的情况下,它将是:

df$education <- as.factor(mapvalues(df$BG_x, c(1,2,3,4,5,6,7,8), 
c('High school or less','High school or less',"Some college","College","Grad degree",
"Grad degree","Grad degree","Grad degree")))

对于这个例子,我的眼睛看起来更简单,但当然如果你有一个因子级别,你想用recode替换一大堆数字会更好。