以分组方式标记行

时间:2018-07-15 13:19:17

标签: r

我想区分3种情况:

1 - Events A and B happened at the same session ("ID") - "flag 1".
2 - Events B happened without A - "flag 2".
3 - Else - "flag 0".

例如:

ID   EVENT
1      A
1      B
2      D
2      E
2      C
3      B
4      A

我想得到:

ID   FLAG 
1      1
2      0
3      2
4      0

1 个答案:

答案 0 :(得分:2)

一个人可以使用dplyr::case_when来汇总ID的值。在这种情况下,使用anyall将有助于确定汇总数据是同时包含AB还是仅包含B。解决方案将为:

library(dplyr) 
# In addition, "plyr" shouldn't be brought to the session, otherwise
# it will return one line   

df %>% group_by(ID) %>%
  summarise(FLAG = case_when(
    any(EVENT == "A") & any(EVENT == "B") ~ 1,
    all(EVENT == "B")                     ~ 2,
    TRUE                                  ~ 0
  )) %>% as.data.frame()

#   ID FLAG
# 1  1    1
# 2  2    0
# 3  3    2
# 4  4    0

数据:

df <- read.table(text=
"ID   EVENT
1      A
1      B
2      D
2      c
3      B
4      A",
header = TRUE, stringsAsFactors = FALSE)