数据
我有一个数据框,如下所示:
structure(list(EndoscopyEventRaw = c("", "", "oesophagus:rfa;oesophagus:nac",
"oesophagus:rfa;oesophagus:nac", "oesophagus:brushings", "oesophagus:rfa;oesophagus:emr;oesophagus:nac",
"oesophagus:apc", "oesophagus:apc;oesophagus:nac", "oesophagus:apc",
"")), row.names = c(NA, 10L), class = "data.frame")
目标
我想将其内容提取到新列中,可能基于以下规则使用case_when
:
dataframe<-dataframe %>% mutate(OPCS4ZCode2 = case_when(
grepl("nac",EndoscopyEventRaw)~ "CodeForNAC",
grepl("apc",EndoscopyEventRaw) ~ "CodeForAPC",
grepl("rfa",EndoscopyEventRaw) ~ "CodeForRFA",
grepl("grasp",EndoscopyEventRaw) ~ "CodeForGrasp"
),
TRUE ~ ""
)
问题和期望的结果
但是,某些行中有多个元素被编码为新列,因此最终结果应为:
1
2
3 CodeForRFA,CodeForNAC
4 CodeForRFA,CodeForNAC
5
6 CodeForRFA,CodeForNAC
7 CodeForAPC
8 CodeForAPC,CodeForNAC
9 CodeForAPC
10
当我使用case_when
时,它会在找到第一个匹配项时停止寻找。是否可以使用或不使用case_when
来匹配上述所有目标?
答案 0 :(得分:2)
是的,case_when
会在找到匹配项后跳过其他条件。一种方法是将数据分成不同的行,然后将条件与case_when
一起使用并汇总数据。
library(dplyr)
df %>%
mutate(row = row_number()) %>%
tidyr::separate_rows(EndoscopyEventRaw, sep = ";") %>%
mutate(OPCS4ZCode2 = case_when(grepl("nac",EndoscopyEventRaw)~ "CodeForNAC",
grepl("apc",EndoscopyEventRaw) ~ "CodeForAPC",
grepl("rfa",EndoscopyEventRaw) ~ "CodeForRFA",
grepl("grasp",EndoscopyEventRaw) ~ "CodeForGrasp",
TRUE ~ "")) %>%
group_by(row) %>%
summarise(OPCS4ZCode2 = toString(OPCS4ZCode2)) %>%
select(-row)
# A tibble: 10 x 1
# OPCS4ZCode2
# <chr>
# 1 ""
# 2 ""
# 3 CodeForRFA, CodeForNAC
# 4 CodeForRFA, CodeForNAC
# 5 ""
# 6 CodeForRFA, , CodeForNAC
# 7 CodeForAPC
# 8 CodeForAPC, CodeForNAC
# 9 CodeForAPC
#10 ""