我有一个数据框如下
structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), VisitDate = c("13/02/03", "13/04/05", "13/05/12",
"13/12/06", "13/04/12", "13/05/13", "13/06/14", "13/04/15", "03/04/15",
"04/05/16", "04/06/16"), EVENT = c("EMR", "RFA", "nothing", "nothing",
"EMR", "nothing", "EMR", "EMR", "RFA", "EMR", "nothing")), .Names = c("HospNum_Id",
"VisitDate", "EVENT"), class = "data.frame", row.names = c(NA,
-11L))
我想只选择当前行EVENT
为" EMR"
的行,而此前的行(按升序日期顺序排列)为"没有&#34 ;对于每个HospNum_Id
。
我想要的输出是:
HospNum_Id VisitDate EVENT
2 13/12/06 nothing
2 13/04/12 EMR
2 13/05/13 nothing
2 13/06/14 EMR
但我目前的输出是:
HospNum_Id VisitDate EVENT
(int) (chr) (chr)
1 2 13/04/12 EMR
2 2 13/06/14 EMR
3 2 13/04/15 EMR
目前我有以下代码,但它让我失望我认为因为我在过滤器中使用的是第一个而不是一个含义"before the row that has EMR in the EVENT":
Upstaging<-Therap %>%
arrange(HospNum_Id, as.Date(Therap$VisitDate, '%d/%m/%y')) %>%
group_by(HospNum_Id) %>%
filter(first(EVENT == "nothing") & EVENT == "EMR")
答案 0 :(得分:1)
我们可以使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),按&#39; HospNum_Id&#39;分组,我们得到索引(&#39; i1&#39;)其中&#39; EVENT&#39;是&#34; EMR&#34;以前的值是&#34;没有&#34;。使用该索引获取前一个元素索引(&#39; i1-1&#39;)sort
并获取行索引(.I
)。有了它,我们将行子集化。
library(data.table)
v1 <- setDT(df1)[, {i1 <- which(EVENT == "EMR" & shift(EVENT)=="nothing")
.I[sort(c(i1, i1-1))] } , by = HospNum_Id]$V1
df1[v1]
# HospNum_Id VisitDate EVENT
#1: 2 13/12/06 nothing
#2: 2 13/04/12 EMR
#3: 2 13/05/13 nothing
#4: 2 13/06/14 EMR
或使用dplyr
的类似方法。
library(dplyr)
df1 %>%
group_by(HospNum_Id) %>%
mutate(ind = EVENT=="nothing" & lead(EVENT)=="EMR") %>%
slice(sort(c(which(ind),which(ind)+1))) %>%
select(-ind)
# HospNum_Id VisitDate EVENT
# <int> <chr> <chr>
#1 2 13/12/06 nothing
#2 2 13/04/12 EMR
#3 2 13/05/13 nothing
#4 2 13/06/14 EMR
答案 1 :(得分:0)
只需使用基本操作即可获得所需的结果。
步骤1.加载数据(
步骤2.按升序日期顺序排列数据框
步骤3.选择具有event =“EMR”的行并创建数据框并创建包含前一行的数据框。
步骤4.删除重复项并根据日期排序
LowerChamberSlideDown.To
输出:
a<-loaded dataframe
a[order(as.Date(a$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf <- a[rev(rownames(a)),]
b<- revdf[which(revdf$EVENT=="EMR" ),]
c<- revdf[which(revdf$EVENT=="EMR" )-1,]
d<-rbind(b,c)
e<-d[!duplicated(d),]
f<-e[order(as.Date(e$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf1<-f[rev(rownames(f)),]