跨列使用case_when创建新列

时间:2019-06-03 17:40:41

标签: r string dplyr conditional-statements case-when

我有一个大型数据集,其中包含许多具有状态的列。我想创建一个包含参与者当前状态的新列。我正在dplyr中尝试使用case_when,但不确定如何跨列。数据集有太多列供我输入。这是数据示例:

library(dplyr)
problem <- tibble(name = c("sally", "jane", "austin", "mike"),
                  status1 = c("registered", "completed", "registered", "no action"),
                  status2 = c("completed", "completed", "registered", "no action"),
                  status3 = c("completed", "completed", "withdrawn", "no action"),
                  status4 = c("withdrawn", "completed", "no action", "registered"))

对于代码,我想要一个新列来说明参与者的最终状态;但是,如果他们的状态 ever 已经完成,那么无论他们的最终状态如何,我都想说已完成。对于此数据,答案将如下所示:


answer <- tibble(name = c("sally", "jane", "austin", "mike"),
                 status1 = c("registered", "completed", "registered", "no action"),
                 status2 = c("completed", "completed", "registered", "no action"),
                 status3 = c("completed", "completed", "withdrawn", "no action"),
                 status4 = c("withdrawn", "completed", "no action", "registered"),
                 finalstatus = c("completed", "completed", "no action", "registered"))

此外,如果您可以提供对代码的任何解释,我将不胜感激!如果您的解决方案也可以使用contains(“ status”),那将特别有用,因为在我的真实数据集中,状态列非常混乱(例如,summary_status_5292019,sum_status_07012018等)。

谢谢!

2 个答案:

答案 0 :(得分:3)

带有pmap

的选项
library(tidyverse)
problem %>%
     mutate(finalstatus =  pmap_chr(select(., starts_with('status')), ~ 
       case_when(any(c(...) == "completed")~ "completed",
             any(c(...) == "withdrawn") ~ "no action", 
     TRUE ~ "registered")))

答案 1 :(得分:2)

这里是执行这种“行匹配”操作的功能。与case_when类似,您可以按特定的顺序放置checks向量,以便在找到一个元素的匹配项时,例如数据中的'completed',不考虑与后续元素匹配。

row_match <- function(data, checks, labels){
  matches <- match(unlist(data), checks)
  dim(matches) <- dim(data)
  labels[apply(matches, 1, min, na.rm = T)]
}

df %>% 
  mutate(final.stat = row_match(
                        data = select(df, starts_with('status')),
                        checks = c('completed', 'withdrawn', 'registered'),
                        labels = c('completed', 'no action', 'registered')))

# # A tibble: 4 x 6
#   name   status1    status2    status3   status4    final.stat
#   <chr>  <chr>      <chr>      <chr>     <chr>      <chr>     
# 1 sally  registered completed  completed withdrawn  completed 
# 2 jane   completed  completed  completed completed  completed 
# 3 austin registered registered withdrawn no action  no action 
# 4 mike   no action  no action  no action registered registered