Question

在进行数据分析时，我有时需要将值重新编码为因子以进行组分析。我希望保持因子的顺序与case_when中指定的转换顺序相同。在这种情况下，订单应为"Excellent" "Good" "Fail"。如何在levels=c('Excellent', 'Good', 'Fail')中不再乏味地再次提及它，我怎样才能做到这一点？

非常感谢你。

library(dplyr, warn.conflicts = FALSE)             

set.seed(1234)                                     
score <- runif(100, min = 0, max = 100)     

Performance <- function(x) {                       
  case_when(                                         
    is.na(x) ~ NA_character_,                          
    x > 80   ~ 'Excellent',                            
    x > 50   ~ 'Good',                                 
    TRUE     ~ 'Fail'                                  
  ) %>% factor(levels=c('Excellent', 'Good', 'Fail'))
}                                                  

performance <- Performance(score)                  
levels(performance)                                
#> [1] "Excellent" "Good"      "Fail"
table(performance)                                 
#> performance
#> Excellent      Good      Fail 
#>        15        30        55

编辑：我的解决方案

最后，我想出了一个解决方案。对于那些感兴趣的人，这是我的解决方案。我写了一个函数fct_case_when（假装是forcats中的一个函数）。它只是带有因子输出的case_when的包装器。级别的顺序与参数顺序相同。

fct_case_when <- function(...) {
  args <- as.list(match.call())
  levels <- sapply(args[-1], function(f) f[[3]])  # extract RHS of formula
  levels <- levels[!is.na(levels)]
  factor(dplyr::case_when(...), levels=levels)
}

现在，我可以使用fct_case_when代替case_when，结果将与之前的实现相同（但不那么繁琐）。

Performance <- function(x) {                       
  fct_case_when(                                         
    is.na(x) ~ NA_character_,                          
    x > 80   ~ 'Excellent',                            
    x > 50   ~ 'Good',                                 
    TRUE     ~ 'Fail'                                  
  )
}      
performance <- Performance(score)                  
levels(performance)                       
#> [1] "Excellent" "Good"      "Fail"
table(performance)                
#> performance
#> Excellent      Good      Fail 
#>        15        30        55

Answer 1

默认情况下，

级别按字典顺序设置。如果您不想指定它们，可以设置它们以使字典顺序正确（Performance1），或者创建一次levels向量，并在生成因子时使用它设置级别（Performance2）时。我不知道这些会有多少努力或乏味可以拯救你，但在这里它们是。看看我的第三个建议，我认为这是最乏味的方式。

Performance1 <- function(x) {                       
  case_when(
    is.na(x) ~ NA_character_,                          
    x > 80 ~ 'Excellent',  
    x <= 50 ~ 'Fail',
    TRUE ~ 'Good',
  ) %>% factor()
}

Performance2 <- function(x, levels = c("Excellent", "Good", "Fail")){
  case_when(
    is.na(x) ~ NA_character_,
    x > 80 ~ levels[1],
    x > 50 ~ levels[2],
    TRUE ~ levels[3]
  ) %>% factor(levels)
}
performance1 <- Performance1(score)
levels(performance1)
# [1] "Excellent" "Fail"     "Good"
table(performance1)
# performance1
# Excellent      Fail      Good 
#        15        55        30 

performance2 <- Performance2(score)
levels(performance2)
# [1] "Excellent" "Good"      "Fail"  
table(performance2)
# performance2
# Excellent      Good      Fail 
#        15        30        55

如果我可以建议一种更乏味的方式：

performance <- cut(score, breaks = c(0, 50, 80, 100), 
                   labels = c("Fail", "Good", "Excellent"))
levels(performance)
# [1] "Fail"      "Good"      "Excellent"
table(performance)
# performance
#      Fail      Good Excellent 
#        55        30        15

Answer 2

虽然我的解决方案用一个凌乱的中间变量替换你的管道，但这可行：

    library(dplyr, warn.conflicts = FALSE)             

set.seed(1234)                                     
score <- runif(100, min = 0, max = 100)     

Performance <- function(x) {                       
  t <- case_when(                                         
    is.na(x) ~ NA_character_,                          
    x > 80   ~ 'Excellent',                            
    x > 50   ~ 'Good',                                 
    TRUE     ~ 'Fail'                                  
  ) 
  to <- subset(t, !duplicated(t))
  factor(t, levels=(to[order(subset(x, !duplicated(t)), decreasing=T)] ))
}                                                  
performance <- Performance(score)                
levels(performance)

编辑修复！

R：转换为具有与case_when相同的级别的因子

编辑：我的解决方案

2 个答案: