自动重新排序dplyr中的因子水平

时间:2019-11-11 15:06:53

标签: r dplyr

我有许多因子列,它们随机分布在整个data.frame中。这些因子列具有6个级别,范围从"Very Strongly Disagree""Very Strongly Agree"

我正在寻找一种自动将具有这些水平的每个因子变量重新分级为预定顺序的方法。我目前正在手动处理约30列,例如:

data$immigration <- factor(data$immigration,  
    levels = c("Very Strongly Disagree", "Strongly Disagree", "Disagree", "Moderately
    Agree", "Strongly Agree", "Very Strongly Agree"))

在dplyr链中是否有一种有效的方法可以自动重新调整包含以上级别(或实际上只是以上级别之一)的所有因素?

2 个答案:

答案 0 :(得分:4)

library(dplyr)

l <- c("Very Strongly Disagree", "Strongly Disagree", "Disagree",
       "Moderately Agree", "Strongly Agree", "Very Strongly Agree")

set.seed(1)
df <- 
  tibble(
    A = factor(sample(l, 10, TRUE)),
    B = factor(sample(l, 10, TRUE)),
    N = 1:10,
    S = letters[1:10],
    Z = factor(letters[1:10])
  )
levels(df$A)
#> [1] "Disagree"               "Moderately Agree"      
#> [3] "Strongly Agree"         "Strongly Disagree"     
#> [5] "Very Strongly Agree"    "Very Strongly Disagree"
levels(df$B)
#> [1] "Strongly Agree"         "Strongly Disagree"     
#> [3] "Very Strongly Agree"    "Very Strongly Disagree"


df2 <-
  df %>% 
  mutate_if(~is.factor(.) & all(levels(.) %in% l), factor, levels = l)

levels(df2$A)
#> [1] "Very Strongly Disagree" "Strongly Disagree"     
#> [3] "Disagree"               "Moderately Agree"      
#> [5] "Strongly Agree"         "Very Strongly Agree"
levels(df2$B)
#> [1] "Very Strongly Disagree" "Strongly Disagree"     
#> [3] "Disagree"               "Moderately Agree"      
#> [5] "Strongly Agree"         "Very Strongly Agree"

此处mutate_if与条件~is.factor(.) & all(levels(.) %in% l)一起使用,这意味着我们仅对所有级别都位于预先指定的级别向量中的因子列应用重新级别调整。

答案 1 :(得分:1)

我们可以使用mutate_at函数以及定义的函数来完成此操作(您不必拥有来定义函数,它只是使代码更简洁IMO) 。在这里,我们提前定义了列名。如果您尝试编辑数据中的所有字符列,则可以使用mutate_if(is.character, create_factor)

create_factor <- function(x, 
                          levels = c("Very Strongly Disagree",
                          "Strongly Disagree", "Disagree", "Moderately
                          Agree", "Strongly Agree", "Very Strongly Agree")){

  factor(x, levels)
} 

column_names <- c("immigration")

data %>%
  mutate_at(column_names, create_factor)