将向量转换为R中的因子并保持未声明的级别

时间:2017-08-30 14:48:30

标签: r

我想将R中的字符向量转换为一个因子(让我们从DataCamp引入R课程的示例)并且想要标记一些因子级别。如何避免任何未提及/未声明的级别自动被置于NA?

speed_vector <- c("fast", "slow", "slow", "fast", "insane")

factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "insane"), labels = c("Speed < 30 mph", "Speed > 100 mph"))

结果

> summary(factor_speed_vector)
 Speed < 30 mph Speed > 100 mph            NA's 
              2               1               2 
> factor_speed_vector
[1] <NA>            Speed < 30 mph  Speed < 30 mph  <NA>            Speed > 100 mph
Levels: Speed < 30 mph < Speed > 100 mph

如何确保任何未定义的因子级别(例如&#34;快速&#34;在此示例中)与原始值结转而不是设置为NA?

修改 我之前的评论是因为因子函数中的级别和标签选项混淆了。任何人,也不知道差异可以在这里阅读:Confusion between factor levels and factor labels

4 个答案:

答案 0 :(得分:1)

这适合你吗?

speed_vector <- c("fast", "slow", "slow", "fast", "insane")
factor_speed_vector <- factor(speed_vector)
levels(factor_speed_vector)[factor_speed_vector == "slow"]   <- "Speed < 30 mph"
levels(factor_speed_vector)[factor_speed_vector == "insane"] <-  "Speed > 100 mph"
factor_speed_vector
# [1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph

答案 1 :(得分:1)

使用levelsmatch,您可以执行以下操作。

从因子变量开始:

factor_speed_vector <- factor(c("fast", "slow", "slow", "fast", "insane"), ordered = TRUE)

然后,使用match

更改正确索引的变量级别
levels(factor_speed_vector)[match(c("slow", "insane"), levels(factor_speed_vector))] <-
c("Speed < 30 mph", "Speed > 100 mph")

在这里,match(c("slow", "insane"), levels(factor_speed_vector))找到与“慢”和“疯狂”相匹配的因子水平的指数。这些索引用于对级别进行子集化,然后输入新标签。

答案 2 :(得分:1)

forcats包有一些很好的帮助函数来处理因子。 fct_recode()功能可让您手动更改因子水平。您可以指定一系列命名字符向量,其中名称为新级别,值为旧级别。 未另行提及的级别将保留原样。(来自?fct_recode,强调我的。)

speed_vector <- c("fast", "slow", "slow", "fast", "insane")
speed_vector
[1] "fast"   "slow"   "slow"   "fast"   "insane"
forcats::fct_recode(speed_vector, "Speed < 30 mph" = "slow", "Speed > 100 mph" = "insane")
[1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
Levels: fast Speed > 100 mph Speed < 30 mph

答案 3 :(得分:0)

factor_speed_vector = as.factor(speed_vector)

# > levels(factor_speed_vector)
# [1] "fast"   "insane" "slow"

levels(factor_speed_vector)[3:2] = c("Speed < 30 mph", "Speed > 100 mph")

# > factor_speed_vector
# [1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph