为大型数据帧设置子集将给我们留下一个因子变量,该因子变量需要重新排序和删除缺失的因子。下面是一个代表:
library(tidyverse)
set.seed(1234)
data <- c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass", "5th Std. Pass",
"6th Std. Pass", "Diploma / certificate course", "Graduate", "No Education")
education <- factor(sample(data, size = 5, replace = TRUE),
levels = c(data, "Data not available"))
survey <- tibble(education)
下面的代码as per this answer实现了我们想要的功能,但是我们希望将因子的重新排序和删除整合到调查的管道编码中。
recoded_s <- survey %>% mutate(education =
fct_collapse(education,
"None" = "No Education",
"Primary" = c("5th Std. Pass", "6th Std. Pass"),
"Secondary" = c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass"),
"Tertiary" = c("Diploma / certificate course", "Graduate")
))
recoded_s$education
#> [1] Secondary Primary Primary Primary Tertiary
#> Levels: Secondary Primary Tertiary None Data not available
# Re-ordering and dropping variables
factor(recoded_s$education, levels = c("None", "Primary", "Secondary", "Tertiary"))
#> [1] Secondary Primary Primary Primary Tertiary
#> Levels: None Primary Secondary Tertiary
任何指针将不胜感激!
答案 0 :(得分:2)
我不确定我是否理解。您能详细说明为什么将所有内容包装在mutate
调用中还不够吗?
library(tidyverse)
library(forcats)
survey %>%
mutate(
education = fct_collapse(
education,
"None" = "No Education",
"Primary" = c("5th Std. Pass", "6th Std. Pass"),
"Secondary" = c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass"),
"Tertiary" = c("Diploma / certificate course", "Graduate")),
education = factor(education, levels = c("None", "Primary", "Secondary", "Tertiary")))
dplyr::recode
lvls <- list(
"No Education" = "None",
"5th Std. Pass" = "Primary",
"6th Std. Pass" = "Primary",
"10th Std. Pass" = "Secondary",
"11th Std. Pass" = "Secondary",
"12th Std. Pass" = "Secondary",
"Diploma / certificate course" = "Tertiary",
"Graduate" = "Tertiary")
survey %>%
mutate(
education = factor(recode(education, !!!lvls), unique(map_chr(lvls, 1))))