Oracle SQL选择具有开始和结束日期的行以及是否有一些重叠合并行

时间:2016-03-01 15:15:59

标签: sql oracle

我需要选择包含开始日期和结束日期的行,如果某些日期重叠,请检查行的其余部分是否相同,然后将行与min(startdate)和max(startdate)合并?我想我首先需要对重叠的行进行分组,然后我可以按照这样做进行分组。

每一行都有ID,start_date,end_date和一些数据。有些行的日期范围重叠,有些则不合并,我想合并具有相同ID,数据且日期范围重叠的那些行。

当尝试仅使用建议答案的两个顶行时,我得到了问题中最后一行的三行。

id      valid_from  valid_to
900101  06-MAY-13   02-FEB-14
900101  03-FEB-14   23-JUL-14
900102  01-JAN-10   01-DEC-10
900102  01-JAN-11   23-JAN-13
900102  01-AUG-11   23-JAN-15
900102  01-SEP-11   15-DEC-14

运行后应该是:

id      valid_from  valid_to
900101  06-MAY-13   02-FEB-14
900101  03-FEB-14   23-JUL-14
900102  01-JAN-10   01-DEC-10
900102  01-JAN-11   23-JAN-15  

三个底行合并的地方。

只有两个顶行,建议的代码返回了这个:

900101  06-MAY-13   02-FEB-14 
900101  06-MAY-13   23-JUL-14 
900101  03-FEB-14   23-JUL-14

2 个答案:

答案 0 :(得分:0)

如果您正在编写包含start_dateend_date的表,那么您可能会从阅读Richard Snodgrass的在SQL中开发面向时间的数据库应用程序中受益。人们已经研究了像你这样的问题20多年了,这对于工作程序员的学术文献来说是一个很好的介绍。您可以在亚马逊上获取旧版本或阅读for free online(在"图书"部分中)。

您的具体问题将在第6.5节中讨论。例如,给出了这个表:

   ssn    |  pcn   | start_date | end_date
----------+--------+------------+-----------
111223333 | 120033 | 1996-01-01 | 1996-06-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-04-01 | 1996-10-01
111223333 | 120033 | 1996-10-01 | 1998-01-01
111223333 | 120033 | 1997-12-01 | 1998-01-01

您可以合并相邻/重叠时间段并使用此SQL删除重复项(稍微改编自本书以使用CTE而不是临时表):

WITH temp AS (
  SELECT ssn, pcn, start_date, end_date
  FROM   incumbents
)
SELECT DISTINCT f.ssn, f.pcn, f.start_date, l.end_date
FROM   temp AS f,
       temp AS l
WHERE  f.start_date < l.end_date
AND    f.ssn = l.ssn
AND    f.pcn = l.pcn
AND NOT EXISTS (SELECT 1
                FROM   temp AS m
                WHERE  m.ssn = f.ssn
                AND    m.pcn = f.pcn
                AND    f.end_date < m.start_date
                AND    m.start_date < l.start_date
                AND NOT EXISTS (SELECT 1
                                FROM   temp AS t1
                                WHERE  t1.ssn = f.ssn
                                AND    t1.pcn = f.pcn
                                AND    t1.start_date < m.start_date
                                AND    m.start_date <= t1.end_date))
AND NOT EXISTS (SELECT 1
                FROM   temp AS t2
                WHERE  t2.ssn = f.ssn
                AND    t2.pcn = f.pcn
                AND    ((t2.start_date < f.start_date
                         AND f.start_date <= t2.end_date)
                OR      (t2.start_date <= l.end_date
                         AND l.end_date < t2.end_date)))

这是Postgres方言,但我确信你可以将它改编为Oracle(或任何其他数据库)。此外,您应该将ssnpcn更改为您正在使用的任何密钥(可能是id,只要同一id被允许出现在多个记录中在不同的时间)。

答案 1 :(得分:0)

这将在Oracle中使用分层查询,并且只会查询原始数据两次

WITH d AS
 (
  --
  SELECT DATE '2016-01-01' effective_start_date, DATE '2016-02-01' - 1 effective_end_date, 1 contract_id
    FROM dual
  UNION ALL --
  SELECT DATE '2016-02-01', DATE '2016-04-01' - 1, 1
    FROM dual
  UNION ALL --
  SELECT DATE '2016-04-01', DATE '2016-04-30', 1
    FROM dual
  UNION ALL --

  SELECT DATE '2016-06-01', DATE '2016-07-01' - 1, 1
    FROM dual
  UNION ALL -- gap
  SELECT DATE '2016-07-01' + 1, DATE '2016-07-31', 1
    FROM dual
  UNION ALL --
  -- other contract
  SELECT DATE '2016-02-01', DATE '2016-03-01' - 1, 3
    FROM dual
  UNION ALL --
  SELECT DATE '2016-03-01', DATE '2016-03-31', 3
    FROM dual
  --
  ),
q1 AS
 (
  -- walk the chain backwards and get the "root" start
  SELECT d.*, connect_by_root effective_start_date contract_start, LEVEL
    FROM d

  CONNECT BY PRIOR contract_id = contract_id
         AND PRIOR effective_end_date + 1 = effective_start_date),
q2 AS
 (
  -- walk the chain forward and get the "root" end
  SELECT d.*, connect_by_root effective_end_date contract_end, LEVEL
    FROM d -
  CONNECT BY PRIOR contract_id = contract_id
         AND PRIOR effective_start_date = effective_end_date + 1)
-- join the forward and backward data to get the contiguous contract start and ed
SELECT DISTINCT MIN(a.contract_start) contract_start, MAX(b.contract_end) contract_end, a.contract_id
  FROM q1 a
  JOIN q2 b
    ON a.contract_id = b.contract_id
   AND a.effective_start_date = b.effective_start_date
 GROUP BY a.effective_start_date, a.effective_end_date, a.contract_id

并且它给出了期望的结果

+-----+----------------+--------------+-------------+
|     | CONTRACT_START | CONTRACT_END | CONTRACT_ID |
+-----+----------------+--------------+-------------+
|   1 | 2016-01-01     | 2016-04-30   |           1 |
|   2 | 2016-06-01     | 2016-06-30   |           1 |
|   3 | 2016-07-02     | 2016-07-31   |           1 |
|   4 | 2016-02-01     | 2016-03-31   |           3 |
+-----+----------------+--------------+-------------+