Oracle使用日期差距合并两个表

时间:2012-11-09 17:16:05

标签: oracle plsql

这个问题与我发布的另一个问题recently非常相关,但是我发布了一个新问题,因为这会在解决方面提供更多的复杂性。我正在寻求一些甲骨文忍者和摇滚乐队的帮助,我觉得这对他们的专业知识来说是一个很好的挑战和锻炼。

基本上我有两个表,TableA和TableB。

-- For TableA
CREATE TABLE TableA
(
  ID          VARCHAR2(10),
  LOCN        VARCHAR2(10),
  START_DATE  DATE,
  END_DATE    DATE
)
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/


-- Populate TableA
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('02/04/1996', 'MM/DD/YYYY'),  TO_DATE('02/22/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('02/23/1996', 'MM/DD/YYYY'),  TO_DATE('05/28/2002', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('05/29/2002', 'MM/DD/YYYY'),  TO_DATE('05/03/2005', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('05/04/2005', 'MM/DD/YYYY'),  TO_DATE('05/04/2005', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '30',   TO_DATE('01/31/1996', 'MM/DD/YYYY'),  TO_DATE('02/06/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '02',   TO_DATE('02/07/1996', 'MM/DD/YYYY'),  TO_DATE('02/13/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '02',   TO_DATE('02/14/1996', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3',   '03',   TO_DATE('02/07/1996', 'MM/DD/YYYY'),  TO_DATE('02/13/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3',   '03',   TO_DATE('02/14/1996', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1S4',   '42',   TO_DATE('11/06/2001', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('3S4',   '42',   TO_DATE('11/06/2001', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');



-- For TableB
CREATE TABLE TableB
(
  ID           VARCHAR2(10),
  POSTING      VARCHAR2(20),
  DESCRIPTION  VARCHAR2(100),
  OTHER_ID     VARCHAR2(10),
  START_DATE   DATE,
  END_DATE     DATE
)
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/


INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P1', 'PROFESSOR', 'Sch 1 Quad 1 Area', 'P1', '02/04/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P2', 'PROFESSOR', 'Sch 1 Quad 2 Area', 'P2', '01/31/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P3', 'PROFESSOR', 'Sch 1 Quad 3 Area', 'P3', '02/05/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '02/05/1996', '03/18/2002');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERINTENDENT', 'Sch 1 CD Superintendent', '1S4', '03/19/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '06/10/2009', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERVISOR', 'Sch 2 CAO Supervisor 5', '2S5', '10/26/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', '2S5', '06/10/2009', '07/14/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', 'S5', '07/15/2009', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '02/05/1996', '03/18/2002');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERINTENDENT', 'Sch 3 CD Superintendent', '3S4', '03/19/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '06/10/2009', '01/01/2099');

过程如下: 在TableA中,将合并具有相同ID,LOCN且具有连续START_DATE和END_DATE日期的所有记录。

ID  LOCN    START_DATE  END_DATE
1P1 01      02/04/1996  05/04/2005
1P2 30      01/31/1996  02/06/1996
1P2 02      02/07/1996  01/01/2099
1P3 03      02/07/1996  01/01/2099
1S4 42      11/06/2001  01/01/2099
3S4 42      11/06/2001  01/01/2099

在TableB中,所有具有相同ID,POSTING,OTHER_ID和连续START_DATE和END_DATE的记录也将合并。 (我相信没有数据可以从这个表中合并而来)。

ID  POSTING         DESCRIPTION                 OTHER_ID    START_DATE  END_DATE
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          02/04/1996  01/01/2099
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          01/31/1996  01/01/2099
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/05/1996  01/01/2099
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         02/05/1996  03/18/2002
1S4 SUPERINTENDENT  Sch 1 CD Superintendent     1S4         03/19/2002  06/09/2009
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         06/10/2009  01/01/2099
2S5 SUPERVISOR      Sch 2 CAO Supervisor 5      2S5         10/26/2002  06/09/2009
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  2S5         06/10/2009  07/14/2009
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  S5          07/15/2009  01/01/2099
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         02/05/1996  03/18/2002
3S4 SUPERINTENDENT  Sch 3 CD Superintendent     3S4         03/19/2002  06/09/2009
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         06/10/2009  01/01/2099

根据ID合并TableA和TableB中的记录。 LOCN列将添加到表B中,并且仅基于TableA的日期范围继续进行。结果数据应如下所示:

ID  UNIT_TYPE       DESCRIPTION                 OTHER_ID    START_DATE  END_DATE    LOCN
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          02/04/1996  05/04/2005  01
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          05/05/2005  01/01/2099  {NULL}
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          01/31/1996  02/06/1996  30
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          02/07/1996  01/01/2099  02
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/05/1996  02/06/1996  {NULL}
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/07/1996  01/01/2099  03
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         02/05/1996  11/05/2001  {NULL}
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         11/06/2001  03/18/2002  42
1S4 SUPERINTENDENT  Sch 1 CD Superintendent     1S4         03/19/2002  06/09/2009  42
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         06/10/2009  01/01/2099  42
2S5 SUPERVISOR      Sch 2 CAO Supervisor 5      2S5         10/26/2002  06/09/2009  {NULL}
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  2S5         06/10/2009  07/14/2009  {NULL}
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  S5          07/15/2009  01/01/2099  {NULL}
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         02/05/1996  11/05/2001  {NULL}
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         11/06/2001  03/18/2002  42
3S4 SUPERINTENDENT  Sch 3 CD Superintendent     3S4         03/19/2002  06/09/2009  42
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         06/10/2009  01/01/2099  42

很想听到解决这个问题的任何方法。很多。

增加: 这是我到目前为止写的一个查询,用于折叠TableA中的记录

SELECT ID, LOCN, TO_CHAR(MIN(START_DATE), 'MM/DD/YYYY') START_DATE, TO_CHAR(MAX(END_DATE), 'MM/DD/YYYY') END_DATE
        FROM
             (
              SELECT ID, LOCN, START_DATE, END_DATE, MAX(GRP) OVER (ORDER BY ID, START_DATE) GRP
              FROM
                  (
                   SELECT ID, LOCN,
                          CASE WHEN START_DATE - LAG(END_DATE) OVER (PARTITION BY ID, LOCN ORDER BY START_DATE ASC) <= 1 THEN
                            NULL
                          ELSE
                            ROWNUM
                          END GRP,
                          START_DATE,
                          NVL(END_DATE, SYSDATE) END_DATE
                   FROM TableA
                   ORDER BY ID ASC, START_DATE ASC
                  )
             )
        GROUP BY ID, LOCN, GRP
        ORDER BY ID ASC, START_DATE ASC;

1 个答案:

答案 0 :(得分:2)

由于摇滚乐队正在忙着领导他们的堕落(如果来之不易)的生活方式,而忍者看起来他们将会忙碌一段时间,我会去... ...

您已将其布局,您希望首先在TableA中折叠连续记录,并将该结果用于(可能已折叠)TableB。我不确定这样做是一个单独的步骤是解决整体问题的理想选择,但我现在还会继续使用它。我发现最容易折叠行的一般方法是:

select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
    select id, locn,
        case when start_date = lag_end_date  + interval '1' day then null
            else start_date end as start_date,
        case when end_date = lead_start_date - interval '1' day then null
            else end_date end as end_date,
        row_number() over (partition by id order by start_date)
            - row_number() over (partition by id, locn
                order by start_date) as chain
    from (
        select id, locn, start_date, end_date,
            lead(start_date) over (partition by id, locn
                order by start_date) as lead_start_date,
            lag(end_date) over (partition by id, locn
                order by start_date) as lag_end_date
        from TableA
    )   
)
group by id, locn, chain
order by 1, 3, 2;

ID         LOCN       START_DATE END_DATE
---------- ---------- ---------- ----------
1P1        01         02/04/1996 05/04/2005
1P2        02         02/07/1996 01/01/2099
1P2        30         01/31/1996 02/06/1996
1P3        03         02/07/1996 01/01/2099
1S4        42         11/06/2001 01/01/2099
3S4        42         11/06/2001 01/01/2099

最里面的select使用leadlag来查看相邻的行(您在前一个问题中暗示过这一点)。

下一层将连续值(即一行的开始日期是前一行的结束日期之后的那一天)设置为null;如果只运行该部分,您将看到连续范围的开始和结束。它还添加了一个chain伪列,可以让它处理id切换回之前使用的locn;说让1P2回到locn=30。 (这是我最初看到here的方法,但也可以查看有关gaps and islands的更多信息。如果没有这一点,id/locn的所有“岛屿”都会被视为一个区块,并且最终会出现重叠的日期范围。

外层用户minmax删除空值并生成最终结果。

使用它你可以 - 如果你在11gR2上 - 使用recursive CTE递归加入以获得所有组合。这只是我对其中一个人的第二次真正尝试,所以其他人可能会指出缺陷或改进,如果他们可以将自己从M&amp; Ms中撕掉......可能会给你一些指示。

with a as (
    select id, locn, max(start_date) as start_date, max(end_date) as end_date
    from (
        select id, locn,
            case when start_date = lag_end_date  + interval '1' day then null
                else start_date end as start_date,
            case when end_date = lead_start_date - interval '1' day then null
                else end_date end as end_date,
            row_number() over (partition by id order by start_date)
                - row_number() over (partition by id, locn
                    order by start_date) as chain
        from (
            select id, locn, start_date, end_date,
                lead(start_date) over (partition by id, locn
                    order by start_date) as lead_start_date,
                lag(end_date) over (partition by id, locn
                    order by start_date) as lag_end_date
            from TableA
        )
    )
    group by id, locn, chain
),
b as (
    select id, posting, description, other_id, start_date, end_date,
        row_number() over (partition by id, posting, description,
            other_id order by start_date, end_date) as rn
    from TableB
),
r (id, posting, description, other_id, rn, start_date, end_date, locn) as (
    select b.id, b.posting, b.description, b.other_id, b.rn,
        b.start_date,
        case
            when not (a.start_date > b.end_date or a.end_date < b.start_date)
                and a.start_date <= b.end_date and a.end_date < b.end_date
                then a.end_date
            when not (a.start_date > b.end_date or a.end_date < b.start_date)
                and a.start_date <= b.end_date and a.start_date > b.start_date
                then a.start_date - interval '1' day
            else b.end_date
        end as end_date,
        case
            when a.start_date <= b.start_date and a.end_date >= b.start_date
                then a.locn
        end
    from b
    left join (
        select id, locn, start_date, end_date,
            row_number() over (partition by id order by start_date) as rn
        from a
    ) a on a.id = b.id
        and a.rn = 1
    union all
    select b.id, b.posting, b.description, b.other_id, b.rn,
        case
            when a.start_date is null then r.end_date + interval '1' day
            else a.start_date
        end as start_date,
        case
            when a.start_date is null then b.end_date
            when not (a.start_date > r.end_date or a.end_date < r.start_date)
                then least(a.end_date, b.end_date)
            when a.end_date < b.end_date then a.start_date - interval '1' day
            else b.end_date
        end as end_date,
        a.locn
    from b
    join r on r.id = b.id
        and r.posting = b.posting
        and r.description = b.description
        and r.other_id = b.other_id
        and r.rn = b.rn
        and r.start_date = b.start_date
        and r.end_date < b.end_date
    left join a on a.id = r.id
        and a.start_date > r.end_date
) 
select id, posting as unit_type, description, other_id,
    start_date, end_date, locn
from r
order by id, start_date;

这会得到我想要的结果:

ID         UNIT_TYPE            DESCRIPTION                    OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ------------------------------ ---------- ---------- ---------- ----------
1P1        PROFESSOR            Sch 1 Quad 1 Area              P1         02/04/1996 05/04/2005 01
1P1        PROFESSOR            Sch 1 Quad 1 Area              P1         05/05/2005 01/01/2099
1P2        PROFESSOR            Sch 1 Quad 2 Area              P2         01/31/1996 02/06/1996 30
1P2        PROFESSOR            Sch 1 Quad 2 Area              P2         02/07/1996 01/01/2099 02
1P3        PROFESSOR            Sch 1 Quad 3 Area              P3         02/05/1996 02/06/1996
1P3        PROFESSOR            Sch 1 Quad 3 Area              P3         02/07/1996 01/01/2099 03
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        02/05/1996 11/05/2001
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        11/06/2001 03/18/2002 42
1S4        SUPERINTENDENT       Sch 1 CD Superintendent        1S4        03/19/2002 06/09/2009 42
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        06/10/2009 01/01/2099 42
2S5        SUPERVISOR           Sch 2 CAO Supervisor 5         2S5        10/26/2002 06/09/2009
2S5        SUPERINTENDENT       Sch 2 CAO Superintendent 5     2S5        06/10/2009 07/14/2009
2S5        SUPERINTENDENT       Sch 2 CAO Superintendent 5     S5         07/15/2009 01/01/2099
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        02/05/1996 11/05/2001
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        11/06/2001 03/18/2002 42
3S4        SUPERINTENDENT       Sch 3 CD Superintendent        3S4        03/19/2002 06/09/2009 42
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        06/10/2009 01/01/2099 42

17 rows selected.

这是使用三个CTE。 a如上所述,是TableA的折叠版本。 bTableB但是添加了一个行号列,我想我以后需要在递归过程中保持记录。 r是有趣的开始。

r的第一部分为每个TableB条目生成初始数据,如果合适,可以使用TableA的匹配值 - 但只有第一个匹配的记录可能超过end_date一。这里棘手的一点是找出TableA应该是什么。如果根本没有重叠的TableB记录,则它可以是TableB结束日期;如果有,但它在TableA记录之后开始,那么这需要在TableA开始之前立即结束。否则,它取决于TableB记录在with a as (...), b as (...) select b.id, b.posting, b.description, b.other_id, b.rn, b.start_date, case when not (a.start_date > b.end_date or a.end_date < b.start_date) and a.start_date <= b.end_date and a.end_date < b.end_date then a.end_date when not (a.start_date > b.end_date or a.end_date < b.start_date) and a.start_date <= b.end_date and a.start_date > b.start_date then a.start_date - interval '1' day else b.end_date end as end_date, case when a.start_date <= b.start_date and a.end_date >= b.start_date then a.locn end from b left join ( select id, locn, start_date, end_date, row_number() over (partition by id order by start_date) as rn from a ) a on a.id = b.id and a.rn = 1 order by id, start_date; 之前或之后结束一个。

只运行那部分:

ID         UNIT_TYPE            OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ---------- ---------- ---------- ----------
1P1        PROFESSOR            P1         02/04/1996 05/04/2005 01
1P2        PROFESSOR            P2         01/31/1996 02/06/1996 30
1P3        PROFESSOR            P3         02/05/1996 02/06/1996
1S4        SUPERVISOR           1S4        02/05/1996 11/05/2001
1S4        SUPERINTENDENT       1S4        03/19/2002 06/09/2009 42
1S4        SUPERVISOR           1S4        06/10/2009 01/01/2099 42
2S5        SUPERVISOR           2S5        10/26/2002 06/09/2009
2S5        SUPERINTENDENT       2S5        06/10/2009 07/14/2009
2S5        SUPERINTENDENT       S5         07/15/2009 01/01/2099
3S4        SUPERVISOR           3S4        02/05/1996 11/05/2001
3S4        SUPERINTENDENT       3S4        03/19/2002 06/09/2009 42
3S4        SUPERVISOR           3S4        06/10/2009 01/01/2099 42

12 rows selected.

...给出了这个(为了便于阅读而禁止描述):

IP3

对于TableA,最初没有匹配的end_date记录,但请注意r设置为稍后开始匹配的那一天。

union all的第二部分TableB是递归部分。对于每个end_date记录,它会加入回自身,寻找生成的IP3早于原始记录的记录,就像TableA的情况一样,这意味着有一段时间时间仍然需要填写。然后它会查找合适的start_date记录并为end_dateTableB生成合适的值,这取决于记录是否重叠以及如何重叠。我完全有可能在这里错过了一些边缘案例。

您提到TableA也可能存在连续的折叠范围,您可以对我{{1}}显示的范围进行类似的处理。我不确定这样做是否一定是最好或最清楚的一点,即使只有一张桌子需要它;我只是真的在那里完成了,因为这就是你描述过程的方式。

如果您将递归CTE修改为基表(可能在过程中略微简化),您可以对该结果集应用gap-and-islands方法而不是单个表,因此无关紧要哪个表是由间隙引起的。