我有一个数据框,每行记录一次孩子对健康机构的访问:
head(cred2015)
DNI_PN COD_ESTAB FEC_ATEN SEXO FEC_NAC CONTROL
60099999 3697 2015-08-04 2 2015-07-28 2
60099999 3697 2015-08-14 2 2015-07-28 3
60099999 3697 2015-08-28 2 2015-07-28 1
60099999 3697 2015-09-28 2 2015-07-28 2
60099999 3697 2015-10-28 2 2015-07-28 3
60999999 3697 2015-11-28 2 2015-07-28 4
DNI_PN
识别个人,FEC_NAC
是他们的出生日期,FEC_ATEN
是访问日期。
有一个儿童必须遵循的特定访问时间表,我希望检查这些访问中的每一个,如果孩子是否参加。这就是我目前的尝试:
#Create a vector with unique identifiers to loop over
dni<-unique(cred2015$DNI_PN)
#The first scheduled visit must happen 7 days after birth.
#My custom function takes a lower and upper date limit for wiggle room.
conRN1<-sapply(dni,checkCRED,5,9)
#The function in question:
checkCRED<-function(idnum,llim,ulim){
require('magrittr')
#Examine visits from one individual at a time
subcred<-cred2015 %>% subset(DNI_PN == idnum)
#Check if the lower limit would fall inside the year 2015
if(as.Date("2015-12-31","%Y-%m-%d")-unique(subcred$FEC_NAC)>llim){
#Note if a visit happened within the time window
if((subcred %>% subset(FEC_ATEN-FEC_NAC>=llim & FEC_ATEN-FEC_NAC<=ulim))$CONTROL %>% length > 0){
1 #On time
} else {
0 #Wasn't on time
}
}else{
NA #Exclude if CRED date not in database range
}
}
目前,为1000名不同的人做这件事需要花费近一分钟的时间,这是不可接受的,因为我需要为数十万人评估超过12次不同的预定访问。
请注意,CONTROL
变量会跟踪观察对应的访问次数,但会根据子计划进行重置(第一个月最多2次访问,第一年最多11次,最多第二年6,等等)所以我找不到它的任何用处,但我离开它以防万一比我更聪明地找出涉及它的解决方案。
答案不必限于基础R。
数据
structure(list(DNI_PN = c(60099999, 60099999, 60099999, 60099999,
60099999, 60999999), COD_ESTAB = c(3697, 3697, 3697, 3697, 3697,
3697), FEC_ATEN = structure(c(16651, 16661, 16675, 16706, 16736,
16767), class = "Date"), SEXO = c(2L, 2L, 2L, 2L, 2L, 2L), FEC_NAC = structure(c(16644,
16644, 16644, 16644, 16644, 16644), class = "Date"), CONTROL = c(2L,
3L, 1L, 2L, 3L, 4L)), .Names = c("DNI_PN", "COD_ESTAB", "FEC_ATEN",
"SEXO", "FEC_NAC", "CONTROL"), class = "data.frame", row.names = c(NA,
-6L))