我有一个数据帧df
。
df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,4,4,4,4), process = c("inspection", "evaluation", "result","inspection", "result", "evaluation", "result", "inspection","result","evaluation","result","result","evaluation"))
我需要插入一列true_process
,以便如果evaluation
在特定result
的{{1}}之前,那么它就是ID
。如果它紧随其后或丢失,则应采用值true
。
我尝试过的代码。
false
预期输出如下
library(dplyr)
df %>%
group_by(ID) %>%
mutate(true_process = case_when(
!any(process == "evaluation") ~ "False",
length(process == "evaluation")[[1]] > length(process == "result")[[1]] ~ "False",
TRUE ~ "True"
))
# A tibble: 13 x 3
# Groups: ID [4]
ID process true_process
<dbl> <fct> <chr>
1 1 inspection True
2 1 evaluation True
3 1 result True
4 2 inspection True
5 2 result True
6 2 evaluation True
7 3 result False
8 3 inspection False
9 3 result False
10 4 evaluation True
11 4 result True
12 4 result True
13 4 evaluation True
答案 0 :(得分:3)
根据更新后的数据,您可以检查evaluation
的最后一个实例的索引是否小于result
的任何索引。
library(dplyr)
df %>%
group_by(ID) %>%
mutate(true_process = any(tail(which(process == "evaluation"), 1) < which(process == "result")))
# A tibble: 13 x 3
# Groups: ID [4]
ID process true_process
<dbl> <chr> <lgl>
1 1 inspection TRUE
2 1 evaluation TRUE
3 1 result TRUE
4 2 inspection FALSE
5 2 result FALSE
6 2 evaluation FALSE
7 3 result FALSE
8 3 inspection FALSE
9 3 result FALSE
10 4 evaluation FALSE
11 4 result FALSE
12 4 result FALSE
13 4 evaluation FALSE