我想按患者ID和日期将df1(包含患者的治疗间隔)与df2(包含实验室值)合并,以使实验室值的日期在药物开始日期的5天内。见下文
对于df1:
ID = c(2, 2, 2, 2, 3, 5)
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil")
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date)
ID Medication Start.Date Stop.Date
2 aspirin 05/01/2017 05/30/2017
2 tylenol 05/01/2017 05/15/2017
3 lipitor 05/06/2017 05/18/2017
5 advil 05/28/2017 06/13/2017
对于df2:
ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)
ID Lab.date Lab.wbc
2 04/30/2017 5.4
2 05/03/2017 3.2
2 05/15/2017 7.1
3 05/05/2017 6.0
3 05/18/2017 10.8
5 05/15/2017 11.3
合并应导致以下情况,其中Lab.date为药物开始日期的+或-5天:
ID Medication Start.Date Stop.Date Lab.date Lab.wbc
2 aspirin 05/01/2017 05/30/2017 04/30/2017 5.4
2 aspirin 05/01/2017 05/30/2017 05/03/2017 3.2
2 tylenol 05/01/2017 05/15/2017 04/30/2017 5.4
2 tylenol 05/01/2017 05/15/2017 05/03/2017 3.2
3 lipitor 05/06/2017 05/18/2017 05/05/2017 6.0
答案 0 :(得分:0)
下面是一个可能的解决方案。请注意,最终数据框中还有其他可能的结果,而您在问题末尾并没有考虑这些结果。
library(dplyr)
# reproducing your setup
ID = c(2, 2, 2, 2, 3, 5)
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil")
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date)
ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)
# having a full join by patient ID
full_df <- full_join(df1, df2, by = "ID")
# note that accurate result should include more rows compared to the one given in the question
result <- full_df %>%
# including the day difference for your reference
mutate(Day.diff = abs(as.Date(Start.Date, "%m/%d/%Y") - as.Date(Lab.date, "%m/%d/%Y"))) %>%
# filtering the data frame to keep the difference within 5 days
filter(Day.diff <= 5)