当在R中查看列名末尾的数字大于另一个列名末尾的数字时,请使用case _

时间:2019-06-14 18:06:19

标签: r dplyr conditional-statements purrr

我想在dplyr中使用case_when()来创建一个新的分类列,该列显示一个人在培训中的当前状态。

我有一个类似这样的小标题:

library(dplyr)
problem <- tibble(name = c("Angela", "Claire", "Justin"),
                  status_1 = c("Registered", "No Action", "Completed"),
                  status_2 = c("Withdrawn", "No Action", "Registered"),
                  status_3 = c("No Action", "Registered", "Withdrawn"))

如果此人曾经完成过课程,则应该完成其身份(即使他们以后不小心再次注册了课程,本例中的贾斯汀也提供了证据)。如果他们尚未完成课程,则应注册其状态,并且以后的状态都不能撤消,例如“不采取任何措施”或“已撤回”。什么也没有,或者比他们注册晚了。

在此示例中,最终数据集应如下所示:

library(dplyr)
solution <- tibble(name = c("Angela", "Claire", "Justin"),
                   status_1 = c("Registered", "No Action", "Completed"),
                   status_2 = c("Withdrawn", "No Action", "Registered"),
                   status_3 = c("No Action", "Registered", "Withdrawn"),
                   current = c("Not Taken", "Registered", "Completed"))

Justin完成了,因为他在任何时候都完成了课程。不接受安吉拉是因为她取消了注册。克莱尔之所以被注册,是因为她的状态最远。

这是我到目前为止所拥有的。它正确地分类了贾斯汀和克莱尔,但错误地将了安吉拉。我知道为什么它对她的分类不正确,但是我不知道如何进行注册,因为这涉及到查找后面的数字,并且R正确地将变量名视为一个字符。

library(dplyr)
library(purrr)
solution <- problem %>%
  mutate(current_status = pmap_chr(select(., contains("status")), ~
                                     case_when(any(str_detect(c(...), "(?i)Completed")) ~ "Completed",
                                               any(str_detect(c(...), "(?i)Registered")) ~ "Registered", 
                                               any(str_detect(c(...), "(?i)No Action")) | any(str_detect(c(...), "(?i)Withdrawn")) ~ "Not Taken",
                                               TRUE ~ "NA"))) 

谢谢!

1 个答案:

答案 0 :(得分:3)

这是使用applycase_when的一种方法。 apply一次遍历problem的所有行,并根据case_when条件计算结果。

problem %>% 
 mutate(
   current = 
     apply(select(., starts_with("status")), 1, function(x) {
       case_when(
         "Completed" %in% x ~ "Completed",
         which.max(x=="Registered") > which.max(x %in% c("No Action","Withdrawn")) ~ "Registered",
         TRUE ~ "Not Taken"
       )
     })
  )

# A tibble: 3 x 5
  name   status_1   status_2   status_3   current   
  <chr>  <chr>      <chr>      <chr>      <chr>     
1 Angela Registered Withdrawn  No Action  Not Taken 
2 Claire No Action  No Action  Registered Registered
3 Justin Completed  Registered Withdrawn  Completed 

在管道外,您只需执行-

problem$current <- select(problem, starts_with("status")) %>% 
  apply(., 1, function(x) {
    case_when(
      "Completed" %in% x ~ "Completed",
      which.max(x == "Registered") > which.max(x %in% c("No Action", "Withdrawn")) ~ "Registered",
      TRUE ~ "Not Taken"
    )
  })