一行中有多个case_when

时间:2019-12-13 07:59:44

标签: r dplyr

数据

我有一个数据框,如下所示:

structure(list(EndoscopyEventRaw = c("", "", "oesophagus:rfa;oesophagus:nac", 
"oesophagus:rfa;oesophagus:nac", "oesophagus:brushings", "oesophagus:rfa;oesophagus:emr;oesophagus:nac", 
"oesophagus:apc", "oesophagus:apc;oesophagus:nac", "oesophagus:apc", 
"")), row.names = c(NA, 10L), class = "data.frame")

目标

我想将其内容提取到新列中,可能基于以下规则使用case_when

 dataframe<-dataframe %>%     mutate(OPCS4ZCode2 = case_when( 
      grepl("nac",EndoscopyEventRaw)~  "CodeForNAC",
      grepl("apc",EndoscopyEventRaw) ~  "CodeForAPC",
      grepl("rfa",EndoscopyEventRaw) ~  "CodeForRFA",
      grepl("grasp",EndoscopyEventRaw) ~  "CodeForGrasp"
    ),
    TRUE ~ ""
  )

问题和期望的结果

但是,某些行中有多个元素被编码为新列,因此最终结果应为:

1
2
3 CodeForRFA,CodeForNAC
4 CodeForRFA,CodeForNAC
5 
6 CodeForRFA,CodeForNAC
7 CodeForAPC
8 CodeForAPC,CodeForNAC
9 CodeForAPC
10

当我使用case_when时,它会在找到第一个匹配项时停止寻找。是否可以使用或不使用case_when来匹配上述所有目标?

1 个答案:

答案 0 :(得分:2)

是的,case_when会在找到匹配项后跳过其他条件。一种方法是将数据分成不同的行,然后将条件与case_when一起使用并汇总数据。

library(dplyr)

df %>%
  mutate(row = row_number()) %>%
  tidyr::separate_rows(EndoscopyEventRaw, sep = ";") %>%
  mutate(OPCS4ZCode2 = case_when(grepl("nac",EndoscopyEventRaw)~  "CodeForNAC",
                     grepl("apc",EndoscopyEventRaw) ~  "CodeForAPC",
                     grepl("rfa",EndoscopyEventRaw) ~  "CodeForRFA",
                     grepl("grasp",EndoscopyEventRaw) ~  "CodeForGrasp",
                     TRUE ~ "")) %>%
   group_by(row) %>%
   summarise(OPCS4ZCode2 = toString(OPCS4ZCode2)) %>%
   select(-row)

# A tibble: 10 x 1
#   OPCS4ZCode2             
#   <chr>                   
# 1 ""                      
# 2 ""                      
# 3 CodeForRFA, CodeForNAC  
# 4 CodeForRFA, CodeForNAC  
# 5 ""                      
# 6 CodeForRFA, , CodeForNAC
# 7 CodeForAPC              
# 8 CodeForAPC, CodeForNAC  
# 9 CodeForAPC              
#10 ""