完整历史加入

时间:2018-03-09 12:05:53

标签: sql db2

目前我正在尝试找出历史表之间的连接,我希望同步两个时间轴。 举个例子,我有以下两个表:

A
ID      Value   FROM        TO
1       5       01.01.2018  31.03.2018
1       6       31.03.2018  08.04.2018

B       A_FK    Value   FROM        TO
1       1       50      01.02.2018  01.04.2018
2       1       51      04.04.2018  10.04.2018

作为基线,我想采用表A的时间线并连接表B,包括NULL值,以便我知道,哪些时候没有拟合值。 期望的结果应如下所示:

C
Value_A    Value_B   FROM        TO
5          NULL      01.01.2018  01.02.2018
5          50        01.02.2018  31.03.2018
6          50        31.03.2018  01.04.2018
6          NULL      01.04.2018  04.04.2018
6          51        04.04.2018  08.04.2018

你能帮我解决这个问题吗?我开始了,但可能无法调整错误的历史记录 - 这是我的尝试:

with a as (SELECT *
 FROM (VALUES (1,5,'01.01.2018','31.03.2018')
         , (1,6,'31.03.2018','08.04.2018')
   ) A (ID, VALUE, FROM, TO)),
b as (
SELECT *
  FROM (VALUES (1,1,50,'01.02.2018','01.04.2018')
             , (2,1,51,'04.04.2018','10.04.2018')
       ) A (ID,A_FK, VALUE, FROM, TO)
)
select 
a.value as value_a,
b.value as value_b,
max(a.from,b.from) as from,
min(a.to,b.to) as to
from a
left outer join b on 
a.id = b.a_fk and
a.from < b.to and
a.to > b.from;

正如你所看到的,它是对齐的,但不是我预期的方式。

感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

正如我在评论中提到的,我在另一个问题中使用我自己的answer中的技巧,你可以解决问题。

这是一个解决方案。

测试数据:

create table a (
  id integer,
  value integer,
  dtfrom date,
  dtto date
);

create table b(
  id integer,
  a_fk integer,
  value integer,
  dtfrom date,
  dtto date
);

insert into a values 
   (1, 5, '2018-01-01', '2018-03-31'), 
   (1, 6, '2018-03-31', '2018-04-08');
insert into b values 
   (1, 1, 50, '2018-02-01', '2018-04-01'), 
   (2, 1, 51, '2018-04-04', '2018-04-10');

此解决方案的技巧部分是生成任何表格中的日期间隔,例如01.01.2018-01.02.201801.02.2018-31.03.2018,因此为了做到这一点,您必须拥有所有可用的日期间隔。将日期作为一个表格,因此我创建了一个名为 timmings 的VIEW,以便更轻松:

create or replace view timmings as
  select a.dtfrom dt from a inner join b on a.id=b.a_fk
  union
  select a.dtto from a inner join b on a.id=b.a_fk
  union
  select b.dtfrom from a inner join b on a.id=b.a_fk
  union
  select b.dtto from a inner join b on a.id=b.a_fk;

之后,您需要一个查询来生成所有可用的句点(开始和结束),因此它将是:

select t1.dt as start,
      (select min(t2.dt) 
         from timmings t2 
        where t2.dt>t1.dt) as dend
 from timmings t1
order by start;

这将导致(包含您的样本数据):

  start          dend
01/01/2018    01/02/2018
01/02/2018    31/03/2018
31/03/2018    01/04/2018
01/04/2018    04/04/2018
04/04/2018    08/04/2018
08/04/2018    10/04/2018
10/04/2018    null

使用它可以使用它来获取表a中与句点相交的所有可用值:

select a.id, a.value, tm.start, tm.dend
  from (select t1.dt as start,
              (select min(t2.dt) 
                 from timmings t2 
                where t2.dt>t1.dt) as dend
         from timmings t1) tm
      left join a on tm.start >= a.dtfrom and tm.dend <= a.dtto 
 where a.id is not null
 order by tm.start;

结果是:

id   value    start         end
 1     5    01/01/2018   01/02/2018
 1     5    01/02/2018   31/03/2018
 1     6    31/03/2018   01/04/2018
 1     6    01/04/2018   04/04/2018
 1     6    04/04/2018   08/04/2018

最后你LEFT JOINb表:

 select x.value as valueA,
        b.value as valueB,
        x.start as "from",
        x.dend as "to"
   from (select a.id, a.value, tm.start, tm.dend
          from (select t1.dt as start,
                      (select min(t2.dt) 
                         from timmings t2 
                        where t2.dt>t1.dt) as dend
                 from timmings t1) tm
              left join a on tm.start >= a.dtfrom and tm.dend <= a.dtto 
         where a.id is not null
        ) x 
      left join b on b.a_fk = x.id
                 and b.dtfrom <= x.start
                 and b.dtto >= x.dend
   order by x.start;

这将为您提供所需的结果:

valueA   valueB     start       end
 5        null   01/01/2018  01/02/2018
 5        50     01/02/2018  31/03/2018
 6        50     31/03/2018  01/04/2018
 6        null   01/04/2018  04/04/2018
 6        51     04/04/2018  08/04/2018

请参阅最终解决方案:http://sqlfiddle.com/#!9/36418e/1它是MySQL,但由于它是所有SQL ANSI,它在DB2中都可以正常工作

答案 1 :(得分:0)

有一篇很棒的博客文章 约翰·马恩帕的“Fun with Date Ranges

其次,如果您有机会影响DDL,我建议您仔细查看Db2时态表 - 它们提供完整的SQL支持(Time Travel SQL) - 查找详细信息here

答案 2 :(得分:0)

如果您拥有所谓的日历表 - 包含每个日期的表 - 这实际上非常简单 - 尽管您可以根据需要即时构建一个表。您可以使用它将这更明显地转变为问题 (无论如何,你想要一个,因为它们是最有用的分析维度表之一):

SELECT valueA, valueB, 
       MIN(calendarDate) AS startDate, 
       MAX(calendarDate) + 1 DAY AS endDate
FROM (SELECT A.val AS valueA, B.val AS valueB, Calendar.calendarDate,
             ROW_NUMBER() OVER(ORDER BY Calendar.calendarDate) -
                ROW_NUMBER() OVER(PARTITION BY A.val, B.val ORDER BY Calendar.calendarDate) AS grouping                       
      FROM Calendar
      LEFT JOIN A
             ON A.startDate <= Calendar.calendarDate
                AND A.endDate > Calendar.calendarDate
      LEFT JOIN B
             ON B.startDate <= Calendar.calendarDate
                AND B.endDate > Calendar.calendarDate
      WHERE A.val IS NOT NULL 
            OR B.val IS NOT NULL) Groups
GROUP BY valueA, valueB, grouping
ORDER BY grouping

SQL Fiddle Example (示例中对SQL Server使用的小调整)

...产生以下结果。请注意,表B中的日期范围有几天没有出现在表A中!

valueA  valueB  startDate   endDate
5       (null)  2018-01-01  2018-02-01
5       50      2018-02-01  2018-03-31
6       50      2018-03-31  2018-04-01
6       (null)  2018-04-01  2018-04-04
6       51      2018-04-04  2018-04-08
(null)  51      2018-04-08  2018-04-10

(这当然可以通过将连接切换到常规INNER JOIN来轻易改变,但我认为这个和其他情况很重要。)