我有一组数据:
ID<-c(111,111,222,222,222,222,222,222)
TreatmentDate<-as.Date(c("2010-12-12","2011-12-01","2009-8-7","2010-5-7","2011-3-7","2011-8-5","2013-8-27","2016-9-3"))
Treatment<-c("AA","BB","CC","DD","AA","BB","BB","CC")
df<-data.frame(ID,TreatmentDate,Treatment)
df
ID TreatmentDate Treatment
111 12/12/2010 AA
111 01/12/2011 BB
222 07/08/2009 CC
222 07/05/2010 DD
222 07/03/2011 AA
222 05/08/2011 BB
222 27/08/2013 BB
222 03/09/2016 CC
我还有另一个数据框显示每个主题的测试日期:
UID<-c(111,222)
Testdate<-as.Date(c("2012-12-31","2014-12-31"))
SubjectTestDate<-data.frame(UID,Testdate)
我试图总结一下这样的数据,比方说,如果我想看看一个主题在测试日期之前有多少治疗,我会得到这样的东西,我想把它输出到spreasheet。
ID Prior_to_date TreatmentAA TreatmentBB TreatmentCC TreatmentDD
111 31/12/2012 1 1 0 0
222 31/12/2014 1 2 1 1
任何帮助都将非常感谢!!
答案 0 :(得分:2)
我们可以使用'ID'加入两个数据集,创建一个检查条件的列('indx'),并使用dcast
将'long'转换为'wide'格式
library(data.table)#v1.9.5+
dcast(setkey(setDT(df), ID)[SubjectTestDate][,
indx:=sum(TreatmentDate <=Testdate) , list(ID, Treatment)],
ID+Testdate~ paste0('Treatment', Treatment), value.var='indx', length)
# ID Testdate TreatmentAA TreatmentBB TreatmentCC TreatmentDD
#1: 111 2012-12-31 1 1 0 0
#2: 222 2014-12-31 1 2 2 1
根据修改后的'df',我们将'df'加入'SubjectTestDate',像以前一样创建'indx'列,还有一个序列列'Seq',按'ID'和'Treatment'分组,使用dcast
,然后使用unique
unique(dcast(setkey(setDT(df), ID)[SubjectTestDate][,
c('indx', 'Seq') := list(sum(TreatmentDate <= Testdate), 1:.N) ,
.(ID, Treatment)], ID+ Seq+ Testdate ~ paste0('Treatment',
Treatment), value.var='indx', fill=0), by='ID')
# ID Seq Testdate TreatmentAA TreatmentBB TreatmentCC TreatmentDD
#1: 111 1 2012-12-31 1 1 0 0
#2: 222 1 2014-12-31 1 2 1 1