backload sql history building logic

时间:2017-06-27 22:08:31

标签: sql database data-warehouse netezza type-2-dimension

我有两张桌子 - 表A和表B

表A是主人,我正在尝试构建目标历史记录,同时跟踪表B中item_update_id的移动。

Item_created_id是第一次创建项目。 Item_update_id是与发生的更改相关联的ID。

标记item_update_id的逻辑基于给定id的max(tableB.created)应该小于特定id的tableA.start_dt。 但是有一些id在哪里创建>表A和表B中的start_dt也需要捕获那些,我试过但它给出了所有重复:(

表A和表B(源表) 表A:主表

Product master table    Table A 
ID                     start_dt
13104775853270761200183 2017-05-02 13:48:50
13104775853270761200183 2017-05-03 07:07:04
13104775853270761200183 2017-05-02 14:16:44
13172026130960898286609 2014-09-22 10:19:03
13174721621850903974833 2015-06-04 09:36:44
13172026130960898286609 2015-12-15 15:43:40
13174721621850903974833 2016-01-08 13:37:22
13174721621850903974833 2016-01-08 13:43:16
13174721621850903974833 2016-01-08 13:39:52
13172026130960898286609 2015-12-16 14:58:07
13174721621850903974833 2015-10-26 10:30:51
13174721621850903974833 2015-06-04 09:30:53

表B: 订购表B

ID                         itemid                        CREATED    
13104775853270761200183 14937310030583928893513 2017-05-02 14:16:43 
13174721621850903974833 14334066542302849727098 2015-06-04 09:30:54 
13174721621850903974833 14334066542302849727098 2015-06-04 09:30:54 
13172026130960898286609 14501942190273116500804 2015-12-15 15:43:39 
13172026130960898286609 14502778859283118475305 2015-12-16 14:58:05 
13174721621850903974833 14326542782182842847957 2015-05-26 16:31:18 
13174721621850903974833 14326542782182842847957 2015-05-26 16:31:18 
13104775853270761200183 14937916243033929399830 2017-05-03 07:07:04 
13174721621850903974833 14522603924033168585758 2016-01-08 13:39:52 
13174721621850903974833 14522603924033168585758 2016-01-08 13:39:52 
13172026130960898286609 14501941419223116385878 2015-12-15 15:42:22 
13104775853270761200183 14937293304313928893317 2017-05-02 13:48:50 
13104775853270761200183 14937293304313928893317 2017-05-02 13:48:50 
13174721621850903974833 14458554514083057872538 2015-10-26 10:30:51 
13174721621850903974833 14458554514083057872538 2015-10-26 10:30:51 

输出目标表:

id                        start_dt    end_dt           item_creation_id updatedby_item_id
13104775853270761200183 2017-05-02 13:48:50 2017-05-02 14:16:44 14937293304313928893317 14937293304313928893317
13104775853270761200183 2017-05-02 14:16:44 2017-05-03 07:07:04 14937293304313928893317 14937310030583928893513
13104775853270761200183 2017-05-03 07:07:04 9999-09-09 00:00:00 14937293304313928893317 14937916243033929399830
13172026130960898286609 2015-12-16 14:58:07 9999-09-09 00:00:00 14501942190273116500804 14502778859283118475305
13172026130960898286609 2015-12-15 15:43:40 2015-12-16 14:58:07 14501942190273116500804 14501942190273116500804
13174721621850903974833 2016-01-08 13:39:52 2016-01-08 13:43:16 14326542782182842847957 14522603924033168585758
13174721621850903974833 2016-01-08 13:43:16 9999-09-09 00:00:00 14326542782182842847957 14522603924033168585758
13174721621850903974833 2015-06-04 09:30:53 2015-06-04 09:36:44 14326542782182842847957 14334066542302849727098
13174721621850903974833 2016-01-08 13:37:22 2016-01-08 13:39:52 14326542782182842847957 14458554514083057872538
13174721621850903974833 2015-06-04 09:36:44 2015-10-26 10:30:51 14326542782182842847957 14334066542302849727098
13174721621850903974833 2015-10-26 10:30:51 2016-01-08 13:37:22 14326542782182842847957 14458554514083057872538



 select 
    start_dt,  
    end_dt, 
    id, 
    nvl(item_creation_id,'?') as item_creation_id,
    nvl(item_update_id,'?') as updatedby_item_id
from( select  b1.start_dt,
              b1.end_dt,
              b1.id,
              b2.item_creation_id,--derived in previous step and working fine.
              case when (b2.create_dt) <= b1.start_dt  --problem lies here (need to capture the scenario where created> start_dt)
                   then b2.item_id 
                   else null
             end as item_update_id,
             row_number() over (partition by b1.id, b1.row_no_pp  order by b2.row_no_ol desc) rn
  from TableA b1
left outer join TableB b2
    on (     b1.id=b2.id )
  where     item_update_id is not null  or item_creation_id is null
        )a  where a.rn=1;

row_no_pp - (rownumber()over(由start_dt asc按id顺序分区)(在上一步中驱动)。 row_no_ol--(row_number()over(按创建_dt asc的id顺序分区)(在上一步中驱动)

提前感谢您的帮助。

示例数据: Sample Data

0 个答案:

没有答案