我需要选择包含开始日期和结束日期的行,如果某些日期重叠,请检查行的其余部分是否相同,然后将行与min(startdate)和max(startdate)合并?我想我首先需要对重叠的行进行分组,然后我可以按照这样做进行分组。
每一行都有ID,start_date,end_date和一些数据。有些行的日期范围重叠,有些则不合并,我想合并具有相同ID,数据且日期范围重叠的那些行。
当尝试仅使用建议答案的两个顶行时,我得到了问题中最后一行的三行。
id valid_from valid_to
900101 06-MAY-13 02-FEB-14
900101 03-FEB-14 23-JUL-14
900102 01-JAN-10 01-DEC-10
900102 01-JAN-11 23-JAN-13
900102 01-AUG-11 23-JAN-15
900102 01-SEP-11 15-DEC-14
运行后应该是:
id valid_from valid_to
900101 06-MAY-13 02-FEB-14
900101 03-FEB-14 23-JUL-14
900102 01-JAN-10 01-DEC-10
900102 01-JAN-11 23-JAN-15
三个底行合并的地方。
只有两个顶行,建议的代码返回了这个:
900101 06-MAY-13 02-FEB-14
900101 06-MAY-13 23-JUL-14
900101 03-FEB-14 23-JUL-14
答案 0 :(得分:0)
如果您正在编写包含start_date
和end_date
的表,那么您可能会从阅读Richard Snodgrass的在SQL中开发面向时间的数据库应用程序中受益。人们已经研究了像你这样的问题20多年了,这对于工作程序员的学术文献来说是一个很好的介绍。您可以在亚马逊上获取旧版本或阅读for free online(在"图书"部分中)。
您的具体问题将在第6.5节中讨论。例如,给出了这个表:
ssn | pcn | start_date | end_date
----------+--------+------------+-----------
111223333 | 120033 | 1996-01-01 | 1996-06-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-10-01 | 1998-01-01
111223333 | 120033 | 1997-12-01 | 1998-01-01
您可以合并相邻/重叠时间段并使用此SQL删除重复项(稍微改编自本书以使用CTE而不是临时表):
WITH temp AS (
SELECT ssn, pcn, start_date, end_date
FROM incumbents
)
SELECT DISTINCT f.ssn, f.pcn, f.start_date, l.end_date
FROM temp AS f,
temp AS l
WHERE f.start_date < l.end_date
AND f.ssn = l.ssn
AND f.pcn = l.pcn
AND NOT EXISTS (SELECT 1
FROM temp AS m
WHERE m.ssn = f.ssn
AND m.pcn = f.pcn
AND f.end_date < m.start_date
AND m.start_date < l.start_date
AND NOT EXISTS (SELECT 1
FROM temp AS t1
WHERE t1.ssn = f.ssn
AND t1.pcn = f.pcn
AND t1.start_date < m.start_date
AND m.start_date <= t1.end_date))
AND NOT EXISTS (SELECT 1
FROM temp AS t2
WHERE t2.ssn = f.ssn
AND t2.pcn = f.pcn
AND ((t2.start_date < f.start_date
AND f.start_date <= t2.end_date)
OR (t2.start_date <= l.end_date
AND l.end_date < t2.end_date)))
这是Postgres方言,但我确信你可以将它改编为Oracle(或任何其他数据库)。此外,您应该将ssn
和pcn
更改为您正在使用的任何密钥(可能是id
,只要同一id
被允许出现在多个记录中在不同的时间)。
答案 1 :(得分:0)
这将在Oracle中使用分层查询,并且只会查询原始数据两次
WITH d AS
(
--
SELECT DATE '2016-01-01' effective_start_date, DATE '2016-02-01' - 1 effective_end_date, 1 contract_id
FROM dual
UNION ALL --
SELECT DATE '2016-02-01', DATE '2016-04-01' - 1, 1
FROM dual
UNION ALL --
SELECT DATE '2016-04-01', DATE '2016-04-30', 1
FROM dual
UNION ALL --
SELECT DATE '2016-06-01', DATE '2016-07-01' - 1, 1
FROM dual
UNION ALL -- gap
SELECT DATE '2016-07-01' + 1, DATE '2016-07-31', 1
FROM dual
UNION ALL --
-- other contract
SELECT DATE '2016-02-01', DATE '2016-03-01' - 1, 3
FROM dual
UNION ALL --
SELECT DATE '2016-03-01', DATE '2016-03-31', 3
FROM dual
--
),
q1 AS
(
-- walk the chain backwards and get the "root" start
SELECT d.*, connect_by_root effective_start_date contract_start, LEVEL
FROM d
CONNECT BY PRIOR contract_id = contract_id
AND PRIOR effective_end_date + 1 = effective_start_date),
q2 AS
(
-- walk the chain forward and get the "root" end
SELECT d.*, connect_by_root effective_end_date contract_end, LEVEL
FROM d -
CONNECT BY PRIOR contract_id = contract_id
AND PRIOR effective_start_date = effective_end_date + 1)
-- join the forward and backward data to get the contiguous contract start and ed
SELECT DISTINCT MIN(a.contract_start) contract_start, MAX(b.contract_end) contract_end, a.contract_id
FROM q1 a
JOIN q2 b
ON a.contract_id = b.contract_id
AND a.effective_start_date = b.effective_start_date
GROUP BY a.effective_start_date, a.effective_end_date, a.contract_id
并且它给出了期望的结果
+-----+----------------+--------------+-------------+ | | CONTRACT_START | CONTRACT_END | CONTRACT_ID | +-----+----------------+--------------+-------------+ | 1 | 2016-01-01 | 2016-04-30 | 1 | | 2 | 2016-06-01 | 2016-06-30 | 1 | | 3 | 2016-07-02 | 2016-07-31 | 1 | | 4 | 2016-02-01 | 2016-03-31 | 3 | +-----+----------------+--------------+-------------+