使用Postgresql设计一个缓慢变化的维度2型脚本

时间:2019-06-17 00:13:30

标签: sql postgresql dimensional-modeling

假设我有以下目标表:

CREATE TABLE DimCustomer (
CustomerKey serial PRIMARY KEY,
    CustomerNum int NOT NULL,
    CustomerName varchar(25) NOT NULL,
    Planet varchar(25) NOT NULL,
    RowIsCurrent char(1) NOT NULL DEFAULT 'Y',
    RowStartDate date NOT NULL DEFAULT CURRENT_TIMESTAMP,
    RowEndDate date NOT NULL DEFAULT '12/31/9999'
);

INSERT INTO DimCustomer
(CustomerNum, CustomerName, Planet,  RowStartDate) 
VALUES (101,'Anakin Skywalker', 'Tatooine',   CURRENT_TIMESTAMP - INTERVAL '101 days'),
       (102,'Yoda', 'Coruscant',  CURRENT_TIMESTAMP - INTERVAL '100 days'),
       (103,'Obi-Wan Kenobi', 'Coruscant',  CURRENT_TIMESTAMP - INTERVAL '100 days')

我有一个下面的登台表:

CREATE TABLE Staging_DimCustomer
(
    CustomerNum int NOT NULL,
    CustomerName varchar(25) NOT NULL,
    Planet varchar(25) NOT NULL,
    ChangeDate date NOT NULL DEFAULT CURRENT_TIMESTAMP,
    RankNo int NOT NULL DEFAULT 1
)
INSERT INTO Staging_DimCustomer(CustomerNum, CustomerName, Planet, ChangeDate)
VALUES
(103,'Ben Kenobi', 'Coruscant',   CURRENT_TIMESTAMP - INTERVAL '99 days')

在登台表中,看起来'Obi-Wan Kenobi'customernum 103)的名字改成了  'Ben Kenobi'。我想创建一个实现scd类型2并产生以下结果的脚本(缓慢更改维度类型2):

enter image description here

以下是我的尝试:

INSERT INTO DimCustomer (
  CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
  ) 
 select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '12/31/9999'
 from Staging_DimCustomer 

 ON CONFLICT (CustomerNum) and RowIsCurrent = 'Y'
  DO UPDATE SET
    CustomerName = EXCLUDED.CustomerName,
    Planet = EXCLUDED.Planet,
    RowIsCurrent = 'N',
    RowEndDate = EXCLUDED.ChangeDate

我不知道如何查找已更改的值,更新现有行以将其淘汰,然后使用rowiscurrent = 'Y'标志插入新行。我正在尝试根据此sql服务器文章对我的解决方案进行建模 http://www.made2mentor.com/2013/08/how-to-load-slowly-changing-dimensions-using-t-sql-merge/

2 个答案:

答案 0 :(得分:1)

假设所有更改都在最新行中,那么您可以更新当前行,然后插入:

with u as (
      update dimCustomer c
          set RowIsCurrent = 'N',
              RowEndDate = sc.ChangeDate
      from Staging_DimCustomer sc
      where sc.CustomerNum = c.CustomerNum and
            c.RowIsCurrent = 'Y'
     )
insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                         ) 
     select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
     from Staging_DimCustomer sc;

这假定更改发生在最新记录上。实施历史性变更比较棘手,我想这是没有必要的。

请注意,您可能需要进行其他检查,以确保插入的行实际上与当前行不同。

编辑:

如果要避免更改已存在的行,可以执行以下操作:

with sc as (
      select *
      from Staging_DimCustomer
      where not exists (select 1
                        from DimCustomer c
                        where c.CustomerNum = sc.CustomerNum and
                              c.CustomerName = sc.CustomerName and
                              . . .  -- whatever other columns you want to check
                      )
     ),
     u as (
      update dimCustomer c
          set RowIsCurrent = 'N',
              RowEndDate = sc.ChangeDate
      from sc
      where sc.CustomerNum = c.CustomerNum and
            c.RowIsCurrent = 'Y'
     )
insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                         ) 
     select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
     from sc;

答案 1 :(得分:0)

我认为这应该可以正常工作,而不是更新或插入现有记录:

with us as (
  update dimCustomer c
      set RowIsCurrent = 'N',
          RowEndDate = sc.ChangeDate
  from Staging_DimCustomer sc
  where sc.CustomerNum = c.CustomerNum and
        c.RowIsCurrent = 'Y' and 
        sc.customername <> c.customername
 ),
 u as (
 select stg.customernum,stg.customername,stg.planet ,stg.changedate from Staging_DimCustomer  stg
 Inner join  DimCustomer dim on dim.customernum=stg.customernum and dim.rowiscurrent='Y'
 and (dim.customername <> stg.customername
      or dim.planet <> stg.planet
      )
 UNION
    select stg.customernum,stg.customername,stg.planet ,stg.changedate from Staging_DimCustomer  stg
 where  stg.customernum not IN(select dim.customernum  from DimCustomer dim where dim.rowiscurrent='Y')
 )
insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                     ) 
select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
 from  u ;