我有一个数据框如下
structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), VisitDate = c("13/02/03",
"13/04/05", "13/05/12", "13/12/06", "13/04/12", "13/05/13", "13/06/14",
"13/04/15", "03/04/15", "04/05/16", "04/06/16", "13/05/03", "13/06/04",
"13/04/05", "03/04/15", "04/05/16", "04/06/16"), EVENT = c("EMR",
"RFA", "EMR", "nothing", "EMR", "nothing", "EMR", "EMR", "RFA",
"EMR", "nothing", "RFA", "EMR", "EMR", "RFA", "EMR", "nothing"
)), .Names = c("HospNum_Id", "VisitDate", "EVENT"), class = "data.frame", row.names = c(NA,
-17L))
我想只选择在RFA
列中EMR
之前EVENT
出现的HospNum_Ids。 RFA
可以出现在EMR
之前的任何行,而不仅仅是
HospNum_Id VisitDate EVENT
1 13/02/03 EMR
1 13/04/05 RFA
1 13/05/12 EMR
3 03/04/15 RFA
3 04/05/16 EMR
3 04/06/16 nothing
4 13/05/03 RFA
4 13/06/04 EMR
4 13/04/05 EMR
4 03/04/15 RFA
4 04/05/16 EMR
4 04/06/16 nothing
@akrun非常友好地为我提供了连续运行here的东西,但这是不同的
答案 0 :(得分:3)
我们可以尝试使用data.table
。将'data.frame'转换为'data.table'(setDT(df1)
),按'HospNum_Id'分组,我们order
'VisitDate'(转换为Date
类后)。根据{{1}}“EVENT”中有if
“RFA”元素,我们得到第一个“RFA”元素('i1')的索引。使用它我们可以获得满足条件的每个'Hosp_Num_id'的整行的行索引,然后对数据集进行子集化。
any
使用library(data.table)
v1 <- setDT(df1)[order(as.Date(VisitDate, "%d/%m/%y")), if(any(EVENT == "RFA")) {
i1 <- which(EVENT == "RFA")[1]
.I[any(EVENT[(i1+1):.N] =="EMR")]}, by = HospNum_Id]$V1
df1[v1]
# HospNum_Id VisitDate EVENT
# 1: 1 13/02/03 EMR
# 2: 1 13/04/05 RFA
# 3: 1 13/05/12 EMR
# 4: 3 03/04/15 RFA
# 5: 3 04/05/16 EMR
# 6: 3 04/06/16 nothing
# 7: 4 13/05/03 RFA
# 8: 4 13/06/04 EMR
# 9: 4 13/04/05 EMR
#10: 4 03/04/15 RFA
#11: 4 04/05/16 EMR
#12: 4 04/06/16 nothing
dplyr
或者更紧凑的方法是
library(dplyr)
df1 %>%
arrange(HospNum_Id, as.Date(VisitDate, "%d/%m/%y")) %>%
group_by(HospNum_Id) %>%
filter(any(EVENT =="RFA")) %>%
mutate(i1 = EVENT=="RFA" ) %>%
filter( any(EVENT[which(i1)[1]:n()]=="EMR")) %>%
select(-i1)
# HospNum_Id VisitDate EVENT
# <int> <chr> <chr>
#1 1 13/02/03 EMR
#2 1 13/04/05 RFA
#3 1 13/05/12 EMR
#4 3 03/04/15 RFA
#5 3 04/05/16 EMR
#6 3 04/06/16 nothing
#7 4 13/05/03 RFA
#8 4 13/06/04 EMR
#9 4 13/04/05 EMR
#10 4 03/04/15 RFA
#11 4 04/05/16 EMR
#12 4 04/06/16 nothing