我想合并两个表dt_program
和dt_sale
,以使用公用键START
和{{1来查找END
和CH
}},条件如下:
ITEM_ID
必须位于ORDER_TIME
和START
或
END
可以在ORDER_TIME
之后发生(最接近END
的{{1}})提供了数据:
时间表表代表每个频道的节目:
ORDER_TIME
返回:
END
此外,我还有一个销售交易表,用于在客户购买产品时收集数据:
dt_program <- structure(list(CH = c("CH1", "CH1", "CH1", "CH1", "CH1", "CH2",
"CH2", "CH2", "CH3", "CH3", "CH3", "CH3"), ITEM_ID = c(110, 111,
110, 111, 110, 110, 111, 112, 114, 113, 110, 112), START = structure(c(1514791800,
1514799000, 1514806200, 1514813400, 1514820600, 1518602400, 1518609600,
1518616800.005, 1517560200, 1517565600, 1517570999.995, 1517576399.995
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), END = structure(c(1514795400,
1514802600, 1514809800.005, 1514817000.01, 1514824200.015, 1518604200,
1518611400, 1518618600, 1517563800, 1517569200, 1517574600, 1517580000
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-12L), class = c("data.table", "data.frame"))
返回:
CH ITEM_ID START END
1: CH1 110 2018-01-01 07:30:00 2018-01-01 08:30:00
2: CH1 111 2018-01-01 09:30:00 2018-01-01 10:30:00
3: CH1 110 2018-01-01 11:30:00 2018-01-01 12:30:00
4: CH1 111 2018-01-01 13:30:00 2018-01-01 14:30:00
5: CH1 110 2018-01-01 15:30:00 2018-01-01 16:30:00
6: CH2 110 2018-02-14 10:00:00 2018-02-14 10:30:00
7: CH2 111 2018-02-14 12:00:00 2018-02-14 12:30:00
8: CH2 112 2018-02-14 14:00:00 2018-02-14 14:30:00
9: CH3 114 2018-02-02 08:30:00 2018-02-02 09:30:00
10: CH3 113 2018-02-02 10:00:00 2018-02-02 11:00:00
11: CH3 110 2018-02-02 11:29:59 2018-02-02 12:30:00
12: CH3 112 2018-02-02 12:59:59 2018-02-02 14:00:00
我期望的输出:
dt_sale <- structure(list(CUST_ID = c("A001", "A001", "A001", "A002", "A002",
"A003"), CH = c("CH1", "CH3", "CH2", "CH2", "CH3", "CH1"), ORDER_TIME = structure(c(1514793600,
1514813400, 1518619200, 1514816100, 1517565600, 1514803200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ITEM_ID = c(110, 110, 112, 112, 114,
111)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))
能否请您提出建议?
答案 0 :(得分:2)
问题中显示的输出与问题开头的描述不匹配。第2行和第4行不应包含START
和END
的值。
使用双重联接的可能解决方案:
dt_sale[dt_program
, on = .(CH, ITEM_ID, ORDER_TIME > START, ORDER_TIME < END)
, `:=` (START = i.START, END = i.END)
][dt_program
, on = .(CH, ITEM_ID, ORDER_TIME > END)
, `:=` (START = i.START, END = i.END)][]
给出:
> dt_sale CUST_ID CH ORDER_TIME ITEM_ID START END 1: A001 CH1 2018-01-01 08:00:00 110 2018-01-01 07:30:00 2018-01-01 08:30:00 2: A001 CH3 2018-01-01 13:30:00 110 <NA> <NA> 3: A001 CH2 2018-02-14 14:40:00 112 2018-02-14 14:00:00 2018-02-14 14:30:00 4: A002 CH2 2018-01-01 14:15:00 112 <NA> <NA> 5: A002 CH3 2018-02-02 10:00:00 114 2018-02-02 08:30:00 2018-02-02 09:30:00 6: A003 CH1 2018-01-01 10:40:00 111 2018-01-01 09:30:00 2018-01-01 10:30:00