这个问题与我发布的另一个问题recently非常相关,但是我发布了一个新问题,因为这会在解决方面提供更多的复杂性。我正在寻求一些甲骨文忍者和摇滚乐队的帮助,我觉得这对他们的专业知识来说是一个很好的挑战和锻炼。
基本上我有两个表,TableA和TableB。
-- For TableA
CREATE TABLE TableA
(
ID VARCHAR2(10),
LOCN VARCHAR2(10),
START_DATE DATE,
END_DATE DATE
)
STORAGE (
BUFFER_POOL DEFAULT
)
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/
-- Populate TableA
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1', '01', TO_DATE('02/04/1996', 'MM/DD/YYYY'), TO_DATE('02/22/1996', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1', '01', TO_DATE('02/23/1996', 'MM/DD/YYYY'), TO_DATE('05/28/2002', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1', '01', TO_DATE('05/29/2002', 'MM/DD/YYYY'), TO_DATE('05/03/2005', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1', '01', TO_DATE('05/04/2005', 'MM/DD/YYYY'), TO_DATE('05/04/2005', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2', '30', TO_DATE('01/31/1996', 'MM/DD/YYYY'), TO_DATE('02/06/1996', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2', '02', TO_DATE('02/07/1996', 'MM/DD/YYYY'), TO_DATE('02/13/1996', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2', '02', TO_DATE('02/14/1996', 'MM/DD/YYYY'), TO_DATE('01/01/2099', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3', '03', TO_DATE('02/07/1996', 'MM/DD/YYYY'), TO_DATE('02/13/1996', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3', '03', TO_DATE('02/14/1996', 'MM/DD/YYYY'), TO_DATE('01/01/2099', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1S4', '42', TO_DATE('11/06/2001', 'MM/DD/YYYY'), TO_DATE('01/01/2099', 'MM/DD/YYYY');
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('3S4', '42', TO_DATE('11/06/2001', 'MM/DD/YYYY'), TO_DATE('01/01/2099', 'MM/DD/YYYY');
-- For TableB
CREATE TABLE TableB
(
ID VARCHAR2(10),
POSTING VARCHAR2(20),
DESCRIPTION VARCHAR2(100),
OTHER_ID VARCHAR2(10),
START_DATE DATE,
END_DATE DATE
)
STORAGE (
BUFFER_POOL DEFAULT
)
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P1', 'PROFESSOR', 'Sch 1 Quad 1 Area', 'P1', '02/04/1996', '01/01/2099');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P2', 'PROFESSOR', 'Sch 1 Quad 2 Area', 'P2', '01/31/1996', '01/01/2099');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P3', 'PROFESSOR', 'Sch 1 Quad 3 Area', 'P3', '02/05/1996', '01/01/2099');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '02/05/1996', '03/18/2002');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERINTENDENT', 'Sch 1 CD Superintendent', '1S4', '03/19/2002', '06/09/2009');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '06/10/2009', '01/01/2099');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERVISOR', 'Sch 2 CAO Supervisor 5', '2S5', '10/26/2002', '06/09/2009');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', '2S5', '06/10/2009', '07/14/2009');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', 'S5', '07/15/2009', '01/01/2099');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '02/05/1996', '03/18/2002');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERINTENDENT', 'Sch 3 CD Superintendent', '3S4', '03/19/2002', '06/09/2009');
INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '06/10/2009', '01/01/2099');
过程如下: 在TableA中,将合并具有相同ID,LOCN且具有连续START_DATE和END_DATE日期的所有记录。
ID LOCN START_DATE END_DATE
1P1 01 02/04/1996 05/04/2005
1P2 30 01/31/1996 02/06/1996
1P2 02 02/07/1996 01/01/2099
1P3 03 02/07/1996 01/01/2099
1S4 42 11/06/2001 01/01/2099
3S4 42 11/06/2001 01/01/2099
在TableB中,所有具有相同ID,POSTING,OTHER_ID和连续START_DATE和END_DATE的记录也将合并。 (我相信没有数据可以从这个表中合并而来)。
ID POSTING DESCRIPTION OTHER_ID START_DATE END_DATE
1P1 PROFESSOR Sch 1 Quad 1 Area P1 02/04/1996 01/01/2099
1P2 PROFESSOR Sch 1 Quad 2 Area P2 01/31/1996 01/01/2099
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/05/1996 01/01/2099
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 02/05/1996 03/18/2002
1S4 SUPERINTENDENT Sch 1 CD Superintendent 1S4 03/19/2002 06/09/2009
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 06/10/2009 01/01/2099
2S5 SUPERVISOR Sch 2 CAO Supervisor 5 2S5 10/26/2002 06/09/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 2S5 06/10/2009 07/14/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 S5 07/15/2009 01/01/2099
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 02/05/1996 03/18/2002
3S4 SUPERINTENDENT Sch 3 CD Superintendent 3S4 03/19/2002 06/09/2009
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 06/10/2009 01/01/2099
根据ID合并TableA和TableB中的记录。 LOCN列将添加到表B中,并且仅基于TableA的日期范围继续进行。结果数据应如下所示:
ID UNIT_TYPE DESCRIPTION OTHER_ID START_DATE END_DATE LOCN
1P1 PROFESSOR Sch 1 Quad 1 Area P1 02/04/1996 05/04/2005 01
1P1 PROFESSOR Sch 1 Quad 1 Area P1 05/05/2005 01/01/2099 {NULL}
1P2 PROFESSOR Sch 1 Quad 2 Area P2 01/31/1996 02/06/1996 30
1P2 PROFESSOR Sch 1 Quad 2 Area P2 02/07/1996 01/01/2099 02
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/05/1996 02/06/1996 {NULL}
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/07/1996 01/01/2099 03
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 02/05/1996 11/05/2001 {NULL}
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 11/06/2001 03/18/2002 42
1S4 SUPERINTENDENT Sch 1 CD Superintendent 1S4 03/19/2002 06/09/2009 42
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 06/10/2009 01/01/2099 42
2S5 SUPERVISOR Sch 2 CAO Supervisor 5 2S5 10/26/2002 06/09/2009 {NULL}
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 2S5 06/10/2009 07/14/2009 {NULL}
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 S5 07/15/2009 01/01/2099 {NULL}
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 02/05/1996 11/05/2001 {NULL}
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 11/06/2001 03/18/2002 42
3S4 SUPERINTENDENT Sch 3 CD Superintendent 3S4 03/19/2002 06/09/2009 42
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 06/10/2009 01/01/2099 42
很想听到解决这个问题的任何方法。很多。
增加: 这是我到目前为止写的一个查询,用于折叠TableA中的记录
SELECT ID, LOCN, TO_CHAR(MIN(START_DATE), 'MM/DD/YYYY') START_DATE, TO_CHAR(MAX(END_DATE), 'MM/DD/YYYY') END_DATE
FROM
(
SELECT ID, LOCN, START_DATE, END_DATE, MAX(GRP) OVER (ORDER BY ID, START_DATE) GRP
FROM
(
SELECT ID, LOCN,
CASE WHEN START_DATE - LAG(END_DATE) OVER (PARTITION BY ID, LOCN ORDER BY START_DATE ASC) <= 1 THEN
NULL
ELSE
ROWNUM
END GRP,
START_DATE,
NVL(END_DATE, SYSDATE) END_DATE
FROM TableA
ORDER BY ID ASC, START_DATE ASC
)
)
GROUP BY ID, LOCN, GRP
ORDER BY ID ASC, START_DATE ASC;
答案 0 :(得分:2)
由于摇滚乐队正在忙着领导他们的堕落(如果来之不易)的生活方式,而忍者看起来他们将会忙碌一段时间,我会去... ...
您已将其布局,您希望首先在TableA
中折叠连续记录,并将该结果用于(可能已折叠)TableB
。我不确定这样做是一个单独的步骤是解决整体问题的理想选择,但我现在还会继续使用它。我发现最容易折叠行的一般方法是:
select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
select id, locn,
case when start_date = lag_end_date + interval '1' day then null
else start_date end as start_date,
case when end_date = lead_start_date - interval '1' day then null
else end_date end as end_date,
row_number() over (partition by id order by start_date)
- row_number() over (partition by id, locn
order by start_date) as chain
from (
select id, locn, start_date, end_date,
lead(start_date) over (partition by id, locn
order by start_date) as lead_start_date,
lag(end_date) over (partition by id, locn
order by start_date) as lag_end_date
from TableA
)
)
group by id, locn, chain
order by 1, 3, 2;
ID LOCN START_DATE END_DATE
---------- ---------- ---------- ----------
1P1 01 02/04/1996 05/04/2005
1P2 02 02/07/1996 01/01/2099
1P2 30 01/31/1996 02/06/1996
1P3 03 02/07/1996 01/01/2099
1S4 42 11/06/2001 01/01/2099
3S4 42 11/06/2001 01/01/2099
最里面的select
使用lead
和lag
来查看相邻的行(您在前一个问题中暗示过这一点)。
下一层将连续值(即一行的开始日期是前一行的结束日期之后的那一天)设置为null;如果只运行该部分,您将看到连续范围的开始和结束。它还添加了一个chain
伪列,可以让它处理id
切换回之前使用的locn
;说让1P2
回到locn=30
。 (这是我最初看到here的方法,但也可以查看有关gaps and islands的更多信息。如果没有这一点,id/locn
的所有“岛屿”都会被视为一个区块,并且最终会出现重叠的日期范围。
外层用户min
和max
删除空值并生成最终结果。
使用它你可以 - 如果你在11gR2上 - 使用recursive CTE递归加入以获得所有组合。这只是我对其中一个人的第二次真正尝试,所以其他人可能会指出缺陷或改进,如果他们可以将自己从M&amp; Ms中撕掉......可能会给你一些指示。
with a as (
select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
select id, locn,
case when start_date = lag_end_date + interval '1' day then null
else start_date end as start_date,
case when end_date = lead_start_date - interval '1' day then null
else end_date end as end_date,
row_number() over (partition by id order by start_date)
- row_number() over (partition by id, locn
order by start_date) as chain
from (
select id, locn, start_date, end_date,
lead(start_date) over (partition by id, locn
order by start_date) as lead_start_date,
lag(end_date) over (partition by id, locn
order by start_date) as lag_end_date
from TableA
)
)
group by id, locn, chain
),
b as (
select id, posting, description, other_id, start_date, end_date,
row_number() over (partition by id, posting, description,
other_id order by start_date, end_date) as rn
from TableB
),
r (id, posting, description, other_id, rn, start_date, end_date, locn) as (
select b.id, b.posting, b.description, b.other_id, b.rn,
b.start_date,
case
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.end_date < b.end_date
then a.end_date
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.start_date > b.start_date
then a.start_date - interval '1' day
else b.end_date
end as end_date,
case
when a.start_date <= b.start_date and a.end_date >= b.start_date
then a.locn
end
from b
left join (
select id, locn, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
from a
) a on a.id = b.id
and a.rn = 1
union all
select b.id, b.posting, b.description, b.other_id, b.rn,
case
when a.start_date is null then r.end_date + interval '1' day
else a.start_date
end as start_date,
case
when a.start_date is null then b.end_date
when not (a.start_date > r.end_date or a.end_date < r.start_date)
then least(a.end_date, b.end_date)
when a.end_date < b.end_date then a.start_date - interval '1' day
else b.end_date
end as end_date,
a.locn
from b
join r on r.id = b.id
and r.posting = b.posting
and r.description = b.description
and r.other_id = b.other_id
and r.rn = b.rn
and r.start_date = b.start_date
and r.end_date < b.end_date
left join a on a.id = r.id
and a.start_date > r.end_date
)
select id, posting as unit_type, description, other_id,
start_date, end_date, locn
from r
order by id, start_date;
这会得到我想要的结果:
ID UNIT_TYPE DESCRIPTION OTHER_ID START_DATE END_DATE LOCN
---------- -------------------- ------------------------------ ---------- ---------- ---------- ----------
1P1 PROFESSOR Sch 1 Quad 1 Area P1 02/04/1996 05/04/2005 01
1P1 PROFESSOR Sch 1 Quad 1 Area P1 05/05/2005 01/01/2099
1P2 PROFESSOR Sch 1 Quad 2 Area P2 01/31/1996 02/06/1996 30
1P2 PROFESSOR Sch 1 Quad 2 Area P2 02/07/1996 01/01/2099 02
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/05/1996 02/06/1996
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/07/1996 01/01/2099 03
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 02/05/1996 11/05/2001
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 11/06/2001 03/18/2002 42
1S4 SUPERINTENDENT Sch 1 CD Superintendent 1S4 03/19/2002 06/09/2009 42
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 06/10/2009 01/01/2099 42
2S5 SUPERVISOR Sch 2 CAO Supervisor 5 2S5 10/26/2002 06/09/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 2S5 06/10/2009 07/14/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 S5 07/15/2009 01/01/2099
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 02/05/1996 11/05/2001
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 11/06/2001 03/18/2002 42
3S4 SUPERINTENDENT Sch 3 CD Superintendent 3S4 03/19/2002 06/09/2009 42
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 06/10/2009 01/01/2099 42
17 rows selected.
这是使用三个CTE。 a
如上所述,是TableA
的折叠版本。 b
是TableB
但是添加了一个行号列,我想我以后需要在递归过程中保持记录。 r
是有趣的开始。
r
的第一部分为每个TableB
条目生成初始数据,如果合适,可以使用TableA
的匹配值 - 但只有第一个匹配的记录可能超过end_date
一。这里棘手的一点是找出TableA
应该是什么。如果根本没有重叠的TableB
记录,则它可以是TableB
结束日期;如果有,但它在TableA
记录之后开始,那么这需要在TableA
开始之前立即结束。否则,它取决于TableB
记录在with a as (...), b as (...)
select b.id, b.posting, b.description, b.other_id, b.rn,
b.start_date,
case
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.end_date < b.end_date
then a.end_date
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.start_date > b.start_date
then a.start_date - interval '1' day
else b.end_date
end as end_date,
case
when a.start_date <= b.start_date and a.end_date >= b.start_date
then a.locn
end
from b
left join (
select id, locn, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
from a
) a on a.id = b.id
and a.rn = 1
order by id, start_date;
之前或之后结束一个。
只运行那部分:
ID UNIT_TYPE OTHER_ID START_DATE END_DATE LOCN
---------- -------------------- ---------- ---------- ---------- ----------
1P1 PROFESSOR P1 02/04/1996 05/04/2005 01
1P2 PROFESSOR P2 01/31/1996 02/06/1996 30
1P3 PROFESSOR P3 02/05/1996 02/06/1996
1S4 SUPERVISOR 1S4 02/05/1996 11/05/2001
1S4 SUPERINTENDENT 1S4 03/19/2002 06/09/2009 42
1S4 SUPERVISOR 1S4 06/10/2009 01/01/2099 42
2S5 SUPERVISOR 2S5 10/26/2002 06/09/2009
2S5 SUPERINTENDENT 2S5 06/10/2009 07/14/2009
2S5 SUPERINTENDENT S5 07/15/2009 01/01/2099
3S4 SUPERVISOR 3S4 02/05/1996 11/05/2001
3S4 SUPERINTENDENT 3S4 03/19/2002 06/09/2009 42
3S4 SUPERVISOR 3S4 06/10/2009 01/01/2099 42
12 rows selected.
...给出了这个(为了便于阅读而禁止描述):
IP3
对于TableA
,最初没有匹配的end_date
记录,但请注意r
设置为稍后开始匹配的那一天。
union all
的第二部分TableB
是递归部分。对于每个end_date
记录,它会加入回自身,寻找生成的IP3
早于原始记录的记录,就像TableA
的情况一样,这意味着有一段时间时间仍然需要填写。然后它会查找合适的start_date
记录并为end_date
和TableB
生成合适的值,这取决于记录是否重叠以及如何重叠。我完全有可能在这里错过了一些边缘案例。
您提到TableA
也可能存在连续的折叠范围,您可以对我{{1}}显示的范围进行类似的处理。我不确定这样做是否一定是最好或最清楚的一点,即使只有一张桌子需要它;我只是真的在那里完成了,因为这就是你描述过程的方式。
如果您将递归CTE修改为基表(可能在过程中略微简化),您可以对该结果集应用gap-and-islands方法而不是单个表,因此无关紧要哪个表是由间隙引起的。