合并重叠日期的记录

时间:2018-06-09 10:09:21

标签: sql oracle

我有如下数据,想要合并重叠日期的记录。重叠记录的开始和结束日期的MIN和MAX应该是合并记录的开始和结束日期。

合并前:

Item Code               Start_date       End_date
==============          ===========      ===========
111                     15-May-2004      20-Jun-2004
111                     22-May-2004      07-Jun-2004
111                     20-Jun-2004      13-Aug-2004
111                     27-May-2004      30-Aug-2004
111                     02-Sep-2004      23-Dec-2004
222                     21-May-2004      19-Aug-2004 

必需的输出:

Item Code               Start_date       End_date
==============          ===========      ===========
111                     15-May-2004      30-Aug-2004
111                     02-Sep-2004      23-Dec-2004
222                     21-May-2004      19-Aug-2004 

您可以使用

创建样本数据
create table item(item_code  number, start_date date, end_date date);

insert into item values (111,to_date('15-May-2004','DD-Mon-YYYY'),to_date('20-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('22-May-2004','DD-Mon-YYYY'),to_date('07-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('20-Jun-2004','DD-Mon-YYYY'),to_date('13-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('27-May-2004','DD-Mon-YYYY'),to_date('30-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('02-Sep-2004','DD-Mon-YYYY'),to_date('23-Dec-2004','DD-Mon-YYYY'));
insert into item values (222,to_date('21-May-2004','DD-Mon-YYYY'),to_date('19-Aug-2004','DD-Mon-YYYY'));

commit;

4 个答案:

答案 0 :(得分:3)

此类问题的代码相当棘手。这是一种非常有效的方法:

with item (item_code, start_date, end_date) as (
      select 111,to_date('15-05-2004','DD-MM-YYYY'),to_date('20-06-2004','DD-MM-YYYY') from dual union all
      select 111,to_date('22-05-2004','DD-MM-YYYY'),to_date('07-06-2004','DD-MM-YYYY') from dual union all
      select 111,to_date('20-06-2004','DD-MM-YYYY'),to_date('13-08-2004','DD-MM-YYYY') from dual union all
      select 111,to_date('27-05-2004','DD-MM-YYYY'),to_date('30-08-2004','DD-MM-YYYY') from dual union all
      select 111,to_date('02-09-2004','DD-MM-YYYY'),to_date('23-12-2004','DD-MM-YYYY') from dual union all
      select 222,to_date('21-05-2004','DD-MM-YYYY'),to_date('19-08-2004','DD-MM-YYYY') from dual
     ),
     id as (
      select item_code, start_date as dte, count(*) as inc
      from item
      group by item_code, start_date
      union all
      select item_code, end_date, - count(*) as inc
      from item
      group by item_code, end_date
     ),
     id2 as (
      select id.*, sum(inc) over (partition by item_code order by dte) as running_inc
      from id
     ),
     id3 as (
      select id2.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by item_code order by dte desc) as grp
      from id2
     )
select item_code, min(dte) as start_date, max(dte) as end_date
from id3
group by item_code, grp;

rextester验证它。

这是做什么的?好问题。这些问题的想法是定义相邻的组。该方法通过计算直到给定日期的“开始”和“结束”的数量来实现。当值为0时,组结束。

具体步骤如下:

(1)将所有日期分成不同的行,并指示日期是开始日期还是结束日期。该指标是定义范围的关键 - +1为“enter”,“-1”为退出。

(2)计算指标的运行总数。此总数中的0是重叠范围的末尾。

(3)执行0的反向累积和以识别组。

(4)汇总以获得最终结果。

您可以查看每个CTE以查看数据中发生的情况。

答案 1 :(得分:1)

这是差距和岛屿问题的变化。首先计算每行的最大前一个结束日期。然后过滤当前行的开始日期大于该最大日期的行,这是新组的开始,并且在下一行中找到组的结束日期。

WITH max_dates AS
 (
   SELECT
      item_code  
     ,start_date 
     ,Max(end_date)  -- get the maximum prevous end_date
      Over (PARTITION BY item_code  
            ORDER BY start_date 
            ROWS BETWEEN Unbounded Preceding AND 1 Preceding) AS max_prev_date 
     ,Max(end_date)  -- get the maximum overall date (only needed for the last group)
      Over (PARTITION BY item_code) AS max_date 
   FROM   item
 )   
SELECT  
   item_code  
  ,start_date 
  ,Coalesce(Lead(max_prev_date)     -- next row got the end date for the current row
            Over (PARTITION BY item_code  
                  ORDER BY start_date) 
           ,max_date ) AS end_date  -- no next row for the last row --> overall maximum end_date

FROM max_dates
WHERE max_prev_date  < start_date -- maximum previous end date is less than current start date --> start of a new group
   OR max_prev_date  IS NULL      -- first row

答案 2 :(得分:0)

在SQL Server中,您可以尝试这样做。它将提供您想要的输出,但是从性能的角度来看,当需要检查大量数据时,查询可能会变慢。

DECLARE @item Table(item_code  int, start_date date, end_date date);

insert into @item values (111,'15-May-2004','20-Jun-2004');
insert into @item values (111,'22-May-2004','07-Jun-2004');
insert into @item values (111,'20-Jun-2004','13-Aug-2004');
insert into @item values (111,'27-May-2004','30-Aug-2004');
insert into @item values (111,'02-Sep-2004','23-Dec-2004');
insert into @item values (222,'21-May-2004','19-Aug-2004');


SELECT * FROM @item WHERE item_code IN (SELECT item_code FROM @item GROUP BY item_code) AND 
(start_date IN (SELECT max(start_date) FROM @item GROUP BY item_code) or start_date In (SELECT min(start_date) FROM @item GROUP BY item_code))

答案 3 :(得分:0)

在上述答案的帮助下,我可以简化如下

WITH max_dates AS
 (
   SELECT
      item_code  
     ,start_date 
     ,end_date
     ,Max(end_date)  
      Over (PARTITION BY item_code  
            ORDER BY start_date 
            ) AS max_date 
   FROM   item
 )  ,
 max_dates1 as 
 (
 select max_dates.* , lag(max_date) over(partition by item_code order by 1) as MPD from max_dates
 )
 select ITEM_CODE,start_date,end_date from max_dates1  
 WHERE MPD  < start_date
   OR MPD  IS NULL