我有如下数据,想要合并重叠日期的记录。重叠记录的开始和结束日期的MIN和MAX应该是合并记录的开始和结束日期。
合并前:
Item Code Start_date End_date
============== =========== ===========
111 15-May-2004 20-Jun-2004
111 22-May-2004 07-Jun-2004
111 20-Jun-2004 13-Aug-2004
111 27-May-2004 30-Aug-2004
111 02-Sep-2004 23-Dec-2004
222 21-May-2004 19-Aug-2004
必需的输出:
Item Code Start_date End_date
============== =========== ===========
111 15-May-2004 30-Aug-2004
111 02-Sep-2004 23-Dec-2004
222 21-May-2004 19-Aug-2004
您可以使用
创建样本数据create table item(item_code number, start_date date, end_date date);
insert into item values (111,to_date('15-May-2004','DD-Mon-YYYY'),to_date('20-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('22-May-2004','DD-Mon-YYYY'),to_date('07-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('20-Jun-2004','DD-Mon-YYYY'),to_date('13-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('27-May-2004','DD-Mon-YYYY'),to_date('30-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('02-Sep-2004','DD-Mon-YYYY'),to_date('23-Dec-2004','DD-Mon-YYYY'));
insert into item values (222,to_date('21-May-2004','DD-Mon-YYYY'),to_date('19-Aug-2004','DD-Mon-YYYY'));
commit;
答案 0 :(得分:3)
此类问题的代码相当棘手。这是一种非常有效的方法:
with item (item_code, start_date, end_date) as (
select 111,to_date('15-05-2004','DD-MM-YYYY'),to_date('20-06-2004','DD-MM-YYYY') from dual union all
select 111,to_date('22-05-2004','DD-MM-YYYY'),to_date('07-06-2004','DD-MM-YYYY') from dual union all
select 111,to_date('20-06-2004','DD-MM-YYYY'),to_date('13-08-2004','DD-MM-YYYY') from dual union all
select 111,to_date('27-05-2004','DD-MM-YYYY'),to_date('30-08-2004','DD-MM-YYYY') from dual union all
select 111,to_date('02-09-2004','DD-MM-YYYY'),to_date('23-12-2004','DD-MM-YYYY') from dual union all
select 222,to_date('21-05-2004','DD-MM-YYYY'),to_date('19-08-2004','DD-MM-YYYY') from dual
),
id as (
select item_code, start_date as dte, count(*) as inc
from item
group by item_code, start_date
union all
select item_code, end_date, - count(*) as inc
from item
group by item_code, end_date
),
id2 as (
select id.*, sum(inc) over (partition by item_code order by dte) as running_inc
from id
),
id3 as (
select id2.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by item_code order by dte desc) as grp
from id2
)
select item_code, min(dte) as start_date, max(dte) as end_date
from id3
group by item_code, grp;
并rextester验证它。
这是做什么的?好问题。这些问题的想法是定义相邻的组。该方法通过计算直到给定日期的“开始”和“结束”的数量来实现。当值为0时,组结束。
具体步骤如下:
(1)将所有日期分成不同的行,并指示日期是开始日期还是结束日期。该指标是定义范围的关键 - +1为“enter”,“-1”为退出。
(2)计算指标的运行总数。此总数中的0是重叠范围的末尾。
(3)执行0的反向累积和以识别组。
(4)汇总以获得最终结果。
您可以查看每个CTE以查看数据中发生的情况。
答案 1 :(得分:1)
这是差距和岛屿问题的变化。首先计算每行的最大前一个结束日期。然后过滤当前行的开始日期大于该最大日期的行,这是新组的开始,并且在下一行中找到组的结束日期。
WITH max_dates AS
(
SELECT
item_code
,start_date
,Max(end_date) -- get the maximum prevous end_date
Over (PARTITION BY item_code
ORDER BY start_date
ROWS BETWEEN Unbounded Preceding AND 1 Preceding) AS max_prev_date
,Max(end_date) -- get the maximum overall date (only needed for the last group)
Over (PARTITION BY item_code) AS max_date
FROM item
)
SELECT
item_code
,start_date
,Coalesce(Lead(max_prev_date) -- next row got the end date for the current row
Over (PARTITION BY item_code
ORDER BY start_date)
,max_date ) AS end_date -- no next row for the last row --> overall maximum end_date
FROM max_dates
WHERE max_prev_date < start_date -- maximum previous end date is less than current start date --> start of a new group
OR max_prev_date IS NULL -- first row
答案 2 :(得分:0)
在SQL Server中,您可以尝试这样做。它将提供您想要的输出,但是从性能的角度来看,当需要检查大量数据时,查询可能会变慢。
DECLARE @item Table(item_code int, start_date date, end_date date);
insert into @item values (111,'15-May-2004','20-Jun-2004');
insert into @item values (111,'22-May-2004','07-Jun-2004');
insert into @item values (111,'20-Jun-2004','13-Aug-2004');
insert into @item values (111,'27-May-2004','30-Aug-2004');
insert into @item values (111,'02-Sep-2004','23-Dec-2004');
insert into @item values (222,'21-May-2004','19-Aug-2004');
SELECT * FROM @item WHERE item_code IN (SELECT item_code FROM @item GROUP BY item_code) AND
(start_date IN (SELECT max(start_date) FROM @item GROUP BY item_code) or start_date In (SELECT min(start_date) FROM @item GROUP BY item_code))
答案 3 :(得分:0)
在上述答案的帮助下,我可以简化如下
WITH max_dates AS
(
SELECT
item_code
,start_date
,end_date
,Max(end_date)
Over (PARTITION BY item_code
ORDER BY start_date
) AS max_date
FROM item
) ,
max_dates1 as
(
select max_dates.* , lag(max_date) over(partition by item_code order by 1) as MPD from max_dates
)
select ITEM_CODE,start_date,end_date from max_dates1
WHERE MPD < start_date
OR MPD IS NULL