我正在尝试将两个表合并在一起,一个表包含交易数据,另一个来自旅行订单。我的目标是将交易表中的每个记录与旅行订单表中的单个记录相关联,其中交易购买日期和旅行订单开始日期之间的差异最小。这些表由SSN连接,并且我想使用左连接,因为我希望将所有记录保留在交易数据中,即使没有针对SSN的旅行订单。
以下是一些示例数据:
/* Create and populate transactions */
CREATE TABLE transactions (
ssn INT,
purchase_date DATE,
price FLOAT
);
INSERT INTO transactions VALUES
(1111, "2018-12-31", 12.20),
(1111, "2018-11-01", 22.23),
(2222, "2018-08-17", 99.23),
(4444, "2018-06-07", 13.22),
(5555, "2018-03-05", 22.22),
(6666, "2018-05-29", 11.11),
(7777, "2018-10-10", 23.32),
(8888, "2018-06-21", 44.44),
(8888, "2018-01-19", 55.55),
(8888, "2018-02-25", 66.53);
/* Create and populate travel orders */
CREATE TABLE travel_orders (
ssn_id INT NOT NULL,
start_date DATE
);
INSERT INTO travel_orders VALUES
(1111, "2018-12-28"),
(1111, "2018-12-07"),
(2222, "2018-08-12"),
(7777, "2018-10-10"),
(7777, "2018-10-14"),
(8888, "2018-06-18"),
(8888, "2018-01-19"),
(8888, "2018-02-22");
例如,交易表记录
(8888, "2018-06-21", 44.44)
将加入旅行订单记录
(8888, "2018-06-18")
以此类推,以显示剩余的记录。
编辑:预期的输出类似于:
+------+---------------+-------+--------+------------+
| ssn | purchase_date | price | ssn_id | start_date |
+------+---------------+-------+--------+------------+
| 1111 | 2018-12-31 | 12.2 | 1111 | 2018-12-28 |
| 1111 | 2018-11-01 | 22.23 | 1111 | 2018-12-07 |
| 2222 | 2018-08-17 | 99.23 | 2222 | 2018-08-12 |
| 4444 | 2018-06-07 | 13.22 | 4444 | NULL |
| 5555 | 2018-03-05 | 22.22 | 5555 | NULL |
| 6666 | 2018-05-29 | 11.11 | 6666 | NULL |
| 7777 | 2018-10-10 | 23.32 | 7777 | 2018-10-10 |
| 8888 | 2018-06-21 | 44.44 | 8888 | 2018-06-18 |
| 8888 | 2018-01-19 | 55.55 | 8888 | 2018-01-19 |
| 8888 | 2018-02-25 | 66.53 | 8888 | 2018-02-22 |
+------+---------------+-------+--------+------------+
我有一些基本的入门代码
SELECT t.*,
o.*
FROM transactions AS t
LEFT JOIN travel_orders AS o
ON t.ssn = o.ssn_id;
但是我需要添加过滤器,其中记录是根据购买日期和开始日期之间的最小日期差来匹配的。
答案 0 :(得分:2)
此查询将为您提供所需的结果。它为ssn
和购买日期(子查询)的每种组合找到购买日期和开始日期之间的最短时间,然后找到JOIN
到transactions
和travel_orders
的最短时间。表以获取所需的输出:
SELECT t.*,
o.*
FROM transactions t
JOIN (SELECT t.ssn,
t.purchase_date,
MIN(ABS(DATEDIFF(o.start_date, t.purchase_date))) AS dd
FROM transactions t
LEFT JOIN travel_orders o
ON t.ssn = o.ssn_id
GROUP BY t.ssn, t.purchase_date) d
ON d.ssn = t.ssn AND d.purchase_date = t.purchase_date
LEFT JOIN travel_orders o
ON o.ssn_id = t.ssn AND ABS(DATEDIFF(o.start_date, t.purchase_date)) = d.dd
ORDER BY t.ssn, t.price
输出:
ssn purchase_date price ssn_id start_date
1111 2018-12-31 12.2 1111 2018-12-28
1111 2018-11-01 22.23 1111 2018-12-07
2222 2018-08-17 99.23 2222 2018-08-12
4444 2018-06-07 13.22 (null) (null)
5555 2018-03-05 22.22 (null) (null)
6666 2018-05-29 11.11 (null) (null)
7777 2018-10-10 23.32 7777 2018-10-10
8888 2018-06-21 44.44 8888 2018-06-18
8888 2018-01-19 55.55 8888 2018-01-19
8888 2018-02-25 66.53 8888 2018-02-22