如何基于日期范围合并数据集

时间:2017-07-20 18:07:33

标签: r

我想合并两个数据集。我想在ID和Date上合并它。但是有些日期是在他们应该在另一个日期合并的行后1-2天,所以他们被排除在外。我如何在日期合并但在日期之间允许最多2天的差异?

我的数据:

#Random letters to fill out a pathology report
pathRep<-replicate(20,paste(sample(LETTERS,50,replace=T),collapse=""))
pathDate<-as.Date(c("1993-12-22","1994-05-16","1992-07-20","1996-06-02","1992-04-20","1996-08-30","1992-01-26","1991-03-23","1995-12-28","1995-07-15","1993-04-04","1994-01-11","1999-08-21","1993-11-10","1994-02-26","1992-08-06","1993-06-29","1997-03-08","1998-03-03","1998-04-17"))
#Random Numbers
pathHospitalNum<-c("H432243","T662272","G424284","W787634","H432243","Y980037","H432243","W787634","Y980037","E432243","U874287","Y980037","U874287","W787634","Y980037","H432243","Y980037","E432243","W787634","W787634")
#Create the dataframe
pathdf<-data.frame(pathRep,pathDate,pathHospitalNum)

#Random letters to fill out a pathology report
EndoRep<-replicate(20,paste(sample(LETTERS,50,replace=T),collapse=""))
EndoDate<-as.Date(c("1993-12-22","1994-05-14","1992-07-19","1996-06-01","1992-04-20","1996-08-30","1992-01-24","1991-03-21","1995-12-28","1995-07-15","1993-04-02","1994-01-10","1999-08-21","1993-11-10","1994-02-26","1992-08-05","1993-06-29","1997-03-07","1998-03-03","1998-04-17"))
#Random Numbers
EndoHospitalNum<-c("H432243","T662272","G424284","W787634","H432243","Y980037","H432243","W787634","Y980037","E432243","U874287","Y980037","U874287","W787634","Y980037","H432243","Y980037","E432243","W787634","W787634")
#Create the dataframe:
Endodf<-data.frame(EndoRep,EndoDate,EndoHospitalNum)

这只会合并确切的日期:

merge(Endodf,pathdf,by=c("Date","HospNum"))

我想也许我可以创建一个difftime列,但我想我最终会将每个日期与每个日期进行比较,这可能很费时间?

2 个答案:

答案 0 :(得分:2)

虽然@ alaybourn对<?php include 'GooglePlaces.php'; include 'GooglePlacesClient.php'; $google_places = new joshtronic\GooglePlaces('your_key'); $google_places->location = array(<your_lat>, <your_lon>); $google_places->radius = 800; $results = $google_places->nearbySearch(); header('Content-Type: application/json'); header('Access-Control-Allow-Origin: *'); echo json_encode($results); 滚动加入的回答非常好,但我要添加另一个选项来解决&#34;允许最多2天的日期差异&# 34;问题的一部分(但主要是为了分享对data.table包的一些喜爱)。

fuzzyjoin

答案 1 :(得分:1)

如果您使用roll="nearest"中的data.table选项,它将适用于此套装,但如果您尝试不加入日期&gt; 2天不同,它会失败。

library(data.table)

#Random letters to fill out a pathology report
pathRep<-replicate(20,paste(sample(LETTERS,50,replace=T),collapse=""))
pathDate<-as.Date(c("1993-12-22","1994-05-16","1992-07-20","1996-06-02","1992-04-20","1996-08-30","1992-01-26","1991-03-23","1995-12-28","1995-07-15","1993-04-04","1994-01-11","1999-08-21","1993-11-10","1994-02-26","1992-08-06","1993-06-29","1997-03-08","1998-03-03","1998-04-17"))
#Random Numbers
pathHospitalNum<-c("H432243","T662272","G424284","W787634","H432243","Y980037","H432243","W787634","Y980037","E432243","U874287","Y980037","U874287","W787634","Y980037","H432243","Y980037","E432243","W787634","W787634")
#Create the data table and set key fields
pathdt<-data.table(pathRep,pathDate,pathHospitalNum)
setkey(pathdt, pathHospitalNum, pathDate)


#Random letters to fill out a pathology report
EndoRep<-replicate(20,paste(sample(LETTERS,50,replace=T),collapse=""))
EndoDate<-as.Date(c("1993-12-22","1994-05-14","1992-07-19","1996-06-01","1992-04-20","1996-08-30","1992-01-24","1991-03-21","1995-12-28","1995-07-15","1993-04-02","1994-01-10","1999-08-21","1993-11-10","1994-02-26","1992-08-05","1993-06-29","1997-03-07","1998-03-03","1998-04-17"))
#Random Numbers
EndoHospitalNum<-c("H432243","T662272","G424284","W787634","H432243","Y980037","H432243","W787634","Y980037","E432243","U874287","Y980037","U874287","W787634","Y980037","H432243","Y980037","E432243","W787634","W787634")
#Create the data table and set keys
Endodt<-data.table(EndoRep,EndoDate,EndoHospitalNum)
setkey(Endodt, EndoHospitalNum, EndoDate)

#run the join
Endodt[pathdt,roll="nearest"]