我想将R中的字符向量转换为一个因子(让我们从DataCamp引入R课程的示例)并且想要标记一些因子级别。如何避免任何未提及/未声明的级别自动被置于NA?
speed_vector <- c("fast", "slow", "slow", "fast", "insane")
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "insane"), labels = c("Speed < 30 mph", "Speed > 100 mph"))
结果
> summary(factor_speed_vector)
Speed < 30 mph Speed > 100 mph NA's
2 1 2
> factor_speed_vector
[1] <NA> Speed < 30 mph Speed < 30 mph <NA> Speed > 100 mph
Levels: Speed < 30 mph < Speed > 100 mph
如何确保任何未定义的因子级别(例如&#34;快速&#34;在此示例中)与原始值结转而不是设置为NA?
修改 我之前的评论是因为因子函数中的级别和标签选项混淆了。任何人,也不知道差异可以在这里阅读:Confusion between factor levels and factor labels
答案 0 :(得分:1)
这适合你吗?
speed_vector <- c("fast", "slow", "slow", "fast", "insane")
factor_speed_vector <- factor(speed_vector)
levels(factor_speed_vector)[factor_speed_vector == "slow"] <- "Speed < 30 mph"
levels(factor_speed_vector)[factor_speed_vector == "insane"] <- "Speed > 100 mph"
factor_speed_vector
# [1] fast Speed < 30 mph Speed < 30 mph fast Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph
答案 1 :(得分:1)
使用levels
和match
,您可以执行以下操作。
从因子变量开始:
factor_speed_vector <- factor(c("fast", "slow", "slow", "fast", "insane"), ordered = TRUE)
然后,使用match
levels(factor_speed_vector)[match(c("slow", "insane"), levels(factor_speed_vector))] <-
c("Speed < 30 mph", "Speed > 100 mph")
在这里,match(c("slow", "insane"), levels(factor_speed_vector))
找到与“慢”和“疯狂”相匹配的因子水平的指数。这些索引用于对级别进行子集化,然后输入新标签。
答案 2 :(得分:1)
forcats
包有一些很好的帮助函数来处理因子。 fct_recode()
功能可让您手动更改因子水平。您可以指定一系列命名字符向量,其中名称为新级别,值为旧级别。 未另行提及的级别将保留原样。(来自?fct_recode
,强调我的。)
speed_vector <- c("fast", "slow", "slow", "fast", "insane")
speed_vector
[1] "fast" "slow" "slow" "fast" "insane"
forcats::fct_recode(speed_vector, "Speed < 30 mph" = "slow", "Speed > 100 mph" = "insane")
[1] fast Speed < 30 mph Speed < 30 mph fast Speed > 100 mph Levels: fast Speed > 100 mph Speed < 30 mph
答案 3 :(得分:0)
factor_speed_vector = as.factor(speed_vector)
# > levels(factor_speed_vector)
# [1] "fast" "insane" "slow"
levels(factor_speed_vector)[3:2] = c("Speed < 30 mph", "Speed > 100 mph")
# > factor_speed_vector
# [1] fast Speed < 30 mph Speed < 30 mph fast Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph