根据日期范围合并两个数据集。电阻

时间:2021-05-17 15:13:33

标签: r join merge data.table date-range

我的目标是使用日期范围合并两个数据集。数据集 1 包含患者在医院加班的情况。 Dataset2 包含房间信息超时。我的目标是确定我的 Dataset1 中的住宿类型。它可能会变得复杂,因为某些住院的房间类型可能会发生变化。例如,第 101 次住院的患者是部分 ICU 和部分急诊。

数据集 1

  PatientID Hospital Room           StartDate             EndDate
1       101     ODCC  4SW 2020-06-04 16:21:47 2020-06-22 15:12:39
2       101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34
3       101     ODCC   1N 2020-09-08 14:03:35 2020-10-02 06:50:24
4       101     ODCC   1W 2020-10-02 06:50:25 2020-10-05 14:25:54 

数据集 2

  Hospital Room      Type    StartDT      EndDT
1     ODCC  11A     Other 2020-01-01 2021-05-12
2     ODCC   1W       ICU 2020-06-01 2020-07-30
3     ODCC   1W Emergency 2020-08-01 2021-05-12
4     ODCC   1N Emergency 2020-11-05 2021-02-07

我的目标

  Patient.ID Hospital Room           StartDate             EndDate      Type    StartDT      EndDT
1        101     ODCC  4SW 2020-06-04 16:21:47 2020-06-22 15:12:39      <NA>       <NA>       <NA>
2        101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34       ICU 2020-06-01 2020-07-30
3        102     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
4        101     ODCC   1N 2020-09-08 14:03:35 2020-10-02 06:50:24      <NA>       <NA>       <NA>
5        101     ODCC   1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12

您可以在下面找到复制我的数据集的代码。

stays <- structure(list(
  PatientID = c(101, 101, 101, 101),
  Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"),
  Room = c("4SW", "1W", "1N", "1W"), 
  StartDate = structure(c(1591287707, 1592838760, 1599573815, 1601621425),
                        class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
  EndDate = structure(c(1592838759, 1599573814, 1601621424, 1601907954), 
                      class = c("POSIXct", "POSIXt"), tzone = "UTC")),
  class = "data.frame", row.names = c(NA, -4L))

type <- structure(list(
  Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"), 
  Room = c("11A", "1W", "1W", "1N"), Type = c("Other", "ICU", "Emergency",
                                              "Emergency"), 
  StartDT = structure(c(1577836800, 1590969600, 1596240000, 1604534400), 
                      class = c("POSIXct", "POSIXt"), tzone = "UTC"),
  EndDT = structure(c(1620777600, 1596067200, 1620777600, 1612656000),
                    class = c("POSIXct","POSIXt"), tzone = "UTC")), 
  class = "data.frame", row.names = c(NA, -4L))

谢谢! 马文

1 个答案:

答案 0 :(得分:0)

您可以使用 foverlaps

library(data.table)
setDT(stays)
setDT(type)
setkey(type,Room,StartDT,EndDT)
foverlaps(stays,type,
          by.x=c("Room","StartDate","EndDate"),
          by.y=c("Room","StartDT","EndDT"),type="any")[
          ,.(
            PatientID,
            Hospital,
            Room,
            StartDate,
            EndDate,
            Type,
            StartDT,
            EndDT) ]      

   PatientID Hospital Room           StartDate             EndDate      Type    StartDT      EndDT
1:       101     <NA>   1N 2020-09-08 14:03:35 2020-10-02 06:50:24      <NA>       <NA>       <NA>
2:       101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34       ICU 2020-06-01 2020-07-30
3:       101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
4:       101     ODCC   1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12
5:       101     <NA>  4SW 2020-06-04 16:21:47 2020-06-22 15:12:39      <NA>       <NA>       <NA>
相关问题