我正在处理一个数据集,该数据集包含许多列,分别称为status1,status2等。在这些列中,它说明某人是否被豁免,完成,注册等。
不幸的是,豁免输入不一致。这是一个示例:
library(dplyr)
problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))
我正在尝试使用case_when()创建具有其最终状态的新列。如果说完成,那么它们就完成了。如果说豁免而不说完整,那么它们是豁免的。
重要的部分是我希望我的代码使用contains(“ status”)或仅针对状态列而不需要全部输入的等效项,并且我希望它仅需要部分字符串匹配豁免。
关于在case_when中使用contains,我看到了这个示例,但是我无法将其应用于我的案例:mutate with case_when and contains
到目前为止,这是我尝试使用的方法,但是您可以猜到它没有用:
library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
mutate(final= case_when(pmap_chr(select(., contains("status")), ~
any(c(...) == str_detect(., "Exempt") ~ "Exclude",
TRUE ~ "Complete"
))))
这就是我希望最终产品的外观:
solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
final = c("Exclude", "Completed", "Completed", "Exclude"))
谢谢!
答案 0 :(得分:2)
我认为您正在倒退。将pmap_chr
放在library(dplyr)
library(purrr)
library(stringr)
problem %>%
mutate(final = pmap_chr(select(., contains("status")),
~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
TRUE ~ "Completed")))
内,而不要反过来:
pmap
对于每个problem
迭代(case_when
数据集的每一行),我们想使用Exempt
来检查是否存在字符串(?i)
。 str_detect
中的str_detect(c(...), regex("Exempt", ignore_case = TRUE))
使其不区分大小写。这与编写# A tibble: 4 x 5
person status1 status2 status3 final
<chr> <chr> <chr> <chr> <chr>
1 Corey 7EXEMPT exempt EXEMPTED Exclude
2 Sibley Completed Completed Completed Completed
3 Justin Completed Completed Completed Completed
4 Ruth Pending Pending ExempT - 14 Exclude
输出:
{{1}}