我正在关注car
包中的文档,以便重新编码有序因子变量。
例如,在我的data.frame df
中,我有一个代表教育的变量(BG_x
)。我尝试将其重新编码为:
df <- data.frame(
BG_x = sample(1:8)
)
df$education<-recode(df$BG_x,"1:2='High school or less';3='Some college';4='College';5:8='Grad degree'", levels=c('High school or less','Some college','College','Grad degree'))
table(df$education)
但是,当我检查分布时,变量看起来是按字母顺序而不是我在recode
命令中指定的顺序。对于出了什么问题的任何想法?
答案 0 :(得分:2)
这不是使用recode
的答案,而是展示了如何使用基础R factor
+ levels
执行此操作:
set.seed(1)
df <- data.frame(BG_x = sample(1:8))
df$education <- factor(df$BG_x, ordered = TRUE)
levels(df$education) <- list("High school or less" = 1:2,
"Some college" = 3, "College" = 4,
"Grad degree" = 5:8)
df
# BG_x education
# 1 3 Some college
# 2 8 Grad degree
# 3 4 College
# 4 5 Grad degree
# 5 1 High school or less
# 6 7 Grad degree
# 7 2 High school or less
# 8 6 Grad degree
table(df$education)
#
# High school or less Some college College Grad degree
# 2 1 1 4
一段时间以来,我为这些步骤编写了一个便利包装器(将一个级别分配给多个值)和posted it as a Gist。
您可以按如下方式使用它:
library(devtools)
source_gist("7019545")
df$education <- Factor(df$BG_x, ordered = TRUE,
levels = list("High school or less" = 1:2,
"Some college" = 3, "College" = 4,
"Grad degree" = 5:8))
答案 1 :(得分:1)
因为您的原始变量本身不是一个因素,所以您需要包含:
as.factor.result = TRUE
致电recode
。
答案 2 :(得分:0)
您是否考虑过使用plyr的mapvalues函数?我认为它比汽车的重新编码更容易实现。
在你的情况下,它将是:
df$education <- as.factor(mapvalues(df$BG_x, c(1,2,3,4,5,6,7,8),
c('High school or less','High school or less',"Some college","College","Grad degree",
"Grad degree","Grad degree","Grad degree")))
对于这个例子,我的眼睛看起来更简单,但当然如果你有一个因子级别,你想用recode替换一大堆数字会更好。