按日期列合并彼此之间范围内的数据框

时间:2018-07-07 14:50:49

标签: r dataframe merge date-range

我想按患者ID和日期将df1(包含患者的治疗间隔)与df2(包含实验室值)合并,以使实验室值的日期在药物开始日期的5天内。见下文

对于df1:

ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

  ID Medication Start.Date  Stop.Date
   2    aspirin 05/01/2017 05/30/2017
   2    tylenol 05/01/2017 05/15/2017
   3    lipitor 05/06/2017 05/18/2017
   5      advil 05/28/2017 06/13/2017

对于df2:

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

  ID   Lab.date Lab.wbc
   2 04/30/2017     5.4
   2 05/03/2017     3.2
   2 05/15/2017     7.1
   3 05/05/2017     6.0
   3 05/18/2017    10.8
   5 05/15/2017    11.3

合并应导致以下情况,其中Lab.date为药物开始日期的+或-5天:

   ID Medication Start.Date Stop.Date  Lab.date   Lab.wbc
   2    aspirin  05/01/2017 05/30/2017 04/30/2017 5.4
   2    aspirin  05/01/2017 05/30/2017 05/03/2017 3.2
   2    tylenol  05/01/2017 05/15/2017 04/30/2017 5.4
   2    tylenol  05/01/2017 05/15/2017 05/03/2017 3.2
   3    lipitor  05/06/2017 05/18/2017 05/05/2017 6.0

1 个答案:

答案 0 :(得分:0)

下面是一个可能的解决方案。请注意,最终数据框中还有其他可能的结果,而您在问题末尾并没有考虑这些结果。

library(dplyr)

# reproducing your setup
ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

# having a full join by patient ID
full_df <- full_join(df1, df2, by = "ID")

# note that accurate result should include more rows compared to the one given in the question
result <- full_df %>%
  # including the day difference for your reference
  mutate(Day.diff = abs(as.Date(Start.Date, "%m/%d/%Y") - as.Date(Lab.date, "%m/%d/%Y"))) %>%
  # filtering the data frame to keep the difference within 5 days
  filter(Day.diff <= 5)