比较连续行和选择后续行是特定值

时间:2016-06-26 04:54:08

标签: r

我有一个数据框如下

structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), VisitDate = c("13/02/03", "13/04/05", "13/05/12", 
"13/12/06", "13/04/12", "13/05/13", "13/06/14", "13/04/15", "03/04/15", 
"04/05/16", "04/06/16"), EVENT = c("EMR", "RFA", "nothing", "nothing", 
"EMR", "nothing", "EMR", "EMR", "RFA", "EMR", "nothing")), .Names = c("HospNum_Id", 
"VisitDate", "EVENT"), class = "data.frame", row.names = c(NA, 
-11L))

我想只选择当前行EVENT为" EMR"的行,而此前的行(按升序日期顺序排列)为"没有&#34 ;对于每个HospNum_Id

我想要的输出是:

 HospNum_Id VisitDate EVENT
    2   13/12/06    nothing
    2   13/04/12    EMR
    2   13/05/13    nothing
    2   13/06/14    EMR

但我目前的输出是:

  HospNum_Id VisitDate EVENT
       (int)     (chr) (chr)
1          2  13/04/12   EMR
2          2  13/06/14   EMR
3          2  13/04/15   EMR

目前我有以下代码,但它让我失望我认为因为我在过滤器中使用的是第一个而不是一个含义"before the row that has EMR in the EVENT":

的短语
Upstaging<-Therap %>% 
  arrange(HospNum_Id, as.Date(Therap$VisitDate, '%d/%m/%y')) %>% 
  group_by(HospNum_Id) %>% 
  filter(first(EVENT == "nothing") & EVENT == "EMR")

2 个答案:

答案 0 :(得分:1)

我们可以使用data.table。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)),按&#39; HospNum_Id&#39;分组,我们得到索引(&#39; i1&#39;)其中&#39; EVENT&#39;是&#34; EMR&#34;以前的值是&#34;没有&#34;。使用该索引获取前一个元素索引(&#39; i1-1&#39;)sort并获取行索引(.I)。有了它,我们将行子集化。

library(data.table)
v1 <- setDT(df1)[,  {i1 <- which(EVENT == "EMR" & shift(EVENT)=="nothing")
              .I[sort(c(i1, i1-1))] } , by = HospNum_Id]$V1
df1[v1]
#   HospNum_Id VisitDate   EVENT
#1:          2  13/12/06 nothing
#2:          2  13/04/12     EMR
#3:          2  13/05/13 nothing
#4:          2  13/06/14     EMR

或使用dplyr的类似方法。

library(dplyr)
df1 %>%
    group_by(HospNum_Id) %>% 
    mutate(ind = EVENT=="nothing" & lead(EVENT)=="EMR") %>% 
    slice(sort(c(which(ind),which(ind)+1))) %>% 
    select(-ind)
#   HospNum_Id VisitDate   EVENT   
#      <int>     <chr>   <chr>
#1          2  13/12/06 nothing
#2          2  13/04/12     EMR
#3          2  13/05/13 nothing
#4          2  13/06/14     EMR

答案 1 :(得分:0)

只需使用基本操作即可获得所需的结果。

步骤1.加载数据(

步骤2.按升序日期顺序排列数据框

步骤3.选择具有event =“EMR”的行并创建数据框并创建包含前一行的数据框。

步骤4.删除重复项并根据日期排序

LowerChamberSlideDown.To

输出:

a<-loaded dataframe
a[order(as.Date(a$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf <- a[rev(rownames(a)),]
b<- revdf[which(revdf$EVENT=="EMR" ),] 
c<- revdf[which(revdf$EVENT=="EMR" )-1,]
d<-rbind(b,c)
e<-d[!duplicated(d),] 
f<-e[order(as.Date(e$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf1<-f[rev(rownames(f)),]