Question

我正在关注car包中的文档，以便重新编码有序因子变量。

例如，在我的data.frame df中，我有一个代表教育的变量（BG_x）。我尝试将其重新编码为：

df <- data.frame(
    BG_x = sample(1:8)
)
df$education<-recode(df$BG_x,"1:2='High school or less';3='Some college';4='College';5:8='Grad degree'", levels=c('High school or less','Some college','College','Grad degree'))
table(df$education)

但是，当我检查分布时，变量看起来是按字母顺序而不是我在recode命令中指定的顺序。对于出了什么问题的任何想法？

Answer 1

这不是使用recode的答案，而是展示了如何使用基础R factor + levels执行此操作：

set.seed(1)
df <- data.frame(BG_x = sample(1:8))
df$education <- factor(df$BG_x, ordered = TRUE)
levels(df$education) <- list("High school or less" = 1:2, 
                             "Some college" = 3, "College" = 4, 
                             "Grad degree" = 5:8)
df
#   BG_x           education
# 1    3        Some college
# 2    8         Grad degree
# 3    4             College
# 4    5         Grad degree
# 5    1 High school or less
# 6    7         Grad degree
# 7    2 High school or less
# 8    6         Grad degree
table(df$education)
# 
# High school or less        Some college             College         Grad degree
#                   2                   1                   1                   4

一段时间以来，我为这些步骤编写了一个便利包装器（将一个级别分配给多个值）和posted it as a Gist。

您可以按如下方式使用它：

library(devtools)
source_gist("7019545")
df$education <- Factor(df$BG_x, ordered = TRUE, 
                       levels = list("High school or less" = 1:2, 
                                     "Some college" = 3, "College" = 4, 
                                     "Grad degree" = 5:8))

Answer 2

因为您的原始变量本身不是一个因素，所以您需要包含：

as.factor.result = TRUE

致电recode。

Answer 3

您是否考虑过使用plyr的mapvalues函数？我认为它比汽车的重新编码更容易实现。

在你的情况下，它将是：

df$education <- as.factor(mapvalues(df$BG_x, c(1,2,3,4,5,6,7,8), 
c('High school or less','High school or less',"Some college","College","Grad degree",
"Grad degree","Grad degree","Grad degree")))

对于这个例子，我的眼睛看起来更简单，但当然如果你有一个因子级别，你想用recode替换一大堆数字会更好。

使用R中的“car”将变量重新编码为有序因子变量

3 个答案: