我想区分3种情况:
1 - Events A and B happened at the same session ("ID") - "flag 1".
2 - Events B happened without A - "flag 2".
3 - Else - "flag 0".
例如:
ID EVENT
1 A
1 B
2 D
2 E
2 C
3 B
4 A
我想得到:
ID FLAG
1 1
2 0
3 2
4 0
答案 0 :(得分:2)
一个人可以使用dplyr::case_when
来汇总ID的值。在这种情况下,使用any
和all
将有助于确定汇总数据是同时包含A
和B
还是仅包含B
。解决方案将为:
library(dplyr)
# In addition, "plyr" shouldn't be brought to the session, otherwise
# it will return one line
df %>% group_by(ID) %>%
summarise(FLAG = case_when(
any(EVENT == "A") & any(EVENT == "B") ~ 1,
all(EVENT == "B") ~ 2,
TRUE ~ 0
)) %>% as.data.frame()
# ID FLAG
# 1 1 1
# 2 2 0
# 3 3 2
# 4 4 0
数据:
df <- read.table(text=
"ID EVENT
1 A
1 B
2 D
2 c
3 B
4 A",
header = TRUE, stringsAsFactors = FALSE)