选择子组中存在值的所有行

时间:2016-06-26 05:33:24

标签: r

我有一个数据框如下

    structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), VisitDate = c("13/02/03", 
"13/04/05", "13/05/12", "13/12/06", "13/04/12", "13/05/13", "13/06/14", 
"13/04/15", "03/04/15", "04/05/16", "04/06/16", "13/05/03", "13/06/04", 
"13/04/05", "03/04/15", "04/05/16", "04/06/16"), EVENT = c("EMR", 
"RFA", "EMR", "nothing", "EMR", "nothing", "EMR", "EMR", "RFA", 
"EMR", "nothing", "RFA", "EMR", "EMR", "RFA", "EMR", "nothing"
)), .Names = c("HospNum_Id", "VisitDate", "EVENT"), class = "data.frame", row.names = c(NA, 
-17L))

我想只选择在RFA列中EMR之前EVENT出现的HospNum_Ids。 RFA可以出现在EMR之前的任何行,而不仅仅是

之前的行
   HospNum_Id   VisitDate   EVENT
1   13/02/03    EMR
1   13/04/05    RFA
1   13/05/12    EMR
3   03/04/15    RFA
3   04/05/16    EMR
3   04/06/16    nothing
4   13/05/03    RFA
4   13/06/04    EMR
4   13/04/05    EMR
4   03/04/15    RFA
4   04/05/16    EMR
4   04/06/16    nothing

@akrun非常友好地为我提供了连续运行here的东西,但这是不同的

1 个答案:

答案 0 :(得分:3)

我们可以尝试使用data.table。将'data.frame'转换为'data.table'(setDT(df1)),按'HospNum_Id'分组,我们order'VisitDate'(转换为Date类后)。根据{{​​1}}“EVENT”中有if“RFA”元素,我们得到第一个“RFA”元素('i1')的索引。使用它我们可以获得满足条件的每个'Hosp_Num_id'的整行的行索引,然后对数据集进行子集化。

any

使用library(data.table) v1 <- setDT(df1)[order(as.Date(VisitDate, "%d/%m/%y")), if(any(EVENT == "RFA")) { i1 <- which(EVENT == "RFA")[1] .I[any(EVENT[(i1+1):.N] =="EMR")]}, by = HospNum_Id]$V1 df1[v1] # HospNum_Id VisitDate EVENT # 1: 1 13/02/03 EMR # 2: 1 13/04/05 RFA # 3: 1 13/05/12 EMR # 4: 3 03/04/15 RFA # 5: 3 04/05/16 EMR # 6: 3 04/06/16 nothing # 7: 4 13/05/03 RFA # 8: 4 13/06/04 EMR # 9: 4 13/04/05 EMR #10: 4 03/04/15 RFA #11: 4 04/05/16 EMR #12: 4 04/06/16 nothing

dplyr

或者更紧凑的方法是

library(dplyr)
df1 %>%
   arrange(HospNum_Id, as.Date(VisitDate, "%d/%m/%y")) %>% 
   group_by(HospNum_Id) %>%
   filter(any(EVENT =="RFA")) %>% 
   mutate(i1 = EVENT=="RFA" ) %>% 
   filter( any(EVENT[which(i1)[1]:n()]=="EMR")) %>%
   select(-i1)
#  HospNum_Id VisitDate   EVENT
#        <int>     <chr>   <chr>
#1           1  13/02/03     EMR
#2           1  13/04/05     RFA
#3           1  13/05/12     EMR
#4           3  03/04/15     RFA
#5           3  04/05/16     EMR
#6           3  04/06/16 nothing
#7           4  13/05/03     RFA
#8           4  13/06/04     EMR
#9           4  13/04/05     EMR
#10          4  03/04/15     RFA
#11          4  04/05/16     EMR
#12          4  04/06/16 nothing