通过SQL将存储为键值对的SCD转换为具有历史记录的列

时间:2019-03-18 11:49:32

标签: sql sql-server-2016

我有一个数据集,其中缓慢变化的数据以以下格式存储,键值对存储在行中: 此处的关键是ID列。每个键都有一组属性,这些属性存储在“维度”列中,具有对应的值(键-值对) StartDate和EndDate列提供特定属性的有效性。总会有一个startDate。 EndDate-如果其为NULL,则为ID的此属性的当前值。如果此处有日期,则特定的属性在这些开始日期和结束日期之间具有相应的值。

如以下示例所示,对于ID-FT96, 在'16 / 01/2019'上说属性'Group'的值是'Group2' '01 / 02/2019'属性'Group'的值为'Group22',但截至目前,Group为'Group2'。 如果EndDate为NULL,则表明截至当天的属性值。

  StartDate  |  EndDate   |   ID   | Dimension |    Value    
 ------------|------------|--------|-----------|------------- 
  02/11/2018 | 19/11/2018 | FTID15 | Name      | Name1       
  02/11/2018 | NULL       | FTID15 | Status    | Active      
  02/11/2018 | NULL       | FTID15 | Group     | Group1      
  02/11/2018 | NULL       | FTID15 | Sub Group | SUB Group1  
  20/11/2018 | 19/12/2018 | FTID15 | Name      | Name2       
  20/12/2018 | 23/01/2019 | FTID15 | Name      | Name3       
  24/01/2019 | 20/02/2019 | FTID15 | Name      | Name4       
  21/02/2019 | 27/02/2019 | FTID15 | Name      | Name5       
  28/02/2019 | NULL       | FTID15 | Sub Group | SUB Group2  
  02/11/2018 | 19/11/2018 | FTID12 | Name      | Namex1      
  02/11/2018 | NULL       | FTID12 | Status    | Active      
  02/11/2018 | NULL       | FTID12 | Group     | Group2      
  02/11/2018 | NULL       | FTID12 | Sub Group | SUB Group13 
  20/11/2018 | NULL       | FTID12 | Name      | Namex2      
  02/11/2018 | NULL       | FT96   | Name      | NameYY      
  02/11/2018 | NULL       | FT96   | Status    | Active      
  02/11/2018 | 27/01/2019 | FT96   | Group     | Group2      
  02/11/2018 | 27/01/2019 | FT96   | Sub Group | SUB Group1  
  28/01/2019 | 05/02/2019 | FT96   | Group     | Group22     
  28/01/2019 | NULL       | FT96   | Sub Group | SUB Group22 
  06/02/2019 | 11/02/2019 | FT96   | Group     | Group1      
  12/02/2019 | NULL       | FT96   | Group     | Group2      

我需要一些帮助,以SQL形式转换此数据以以下格式存储。 在此,结果数据集应将每个“维度”作为一个单独的列,并将其相应的值作为该列的值。 对于任何维度值中的每个更改,都应该有一行,以便在一行中的更新之间提供所有维度的值的快照。

结果输出应如下所示。

  StartDate  |  EndDate   |   ID   |  Name  | Status |  Group  |  Sub Group  
 ------------|------------|--------|--------|--------|---------|------------- 
  02/11/2018 | 19/11/2018 | FTID15 | Name1  | Active | Group1  | SUB Group1  
  20/11/2018 | 19/12/2018 | FTID15 | Name2  | Active | Group1  | SUB Group1  
  20/12/2018 | 23/01/2019 | FTID15 | Name3  | Active | Group1  | SUB Group1  
  24/01/2019 | 20/02/2019 | FTID15 | Name4  | Active | Group1  | SUB Group1  
  21/02/2019 | 27/02/2019 | FTID15 | Name5  | Active | Group1  | SUB Group1  
  28/02/2019 | NULL       | FTID15 | Name5  | Active | Group1  | SUB Group2  
  02/11/2018 | 19/11/2018 | FTID12 | Namex1 | Active | Group2  | SUB Group13 
  20/11/2018 | NULL       | FTID12 | Namex2 | Active | Group2  | SUB Group13 
  2018-11-02 | 2019-01-27 | FT96   | NameYY | Active | Group2  | SUB Group1  
  2019-01-28 | 2019-02-05 | FT96   | NameYY | Active | Group22 | SUB Group22 
  2019-02-06 | 2019-02-11 | FT96   | NameYY | Active | Group1  | SUB Group22 
  2019-02-12 | NULL       | FT96   | NameYY | Active | Group2  | SUB Group22 

此处,尺寸值不仅限于示例中提到的4。这可能会有所不同,并且无论尺寸大小如何都需要自动进行转换。

1 个答案:

答案 0 :(得分:0)

您可以尝试一下。 PIVOT和一些窗口功能可以解决您的问题。

SELECT 
 StartDate,
 EndDate,
 ID,
 ISNULL([Name], FIRST_VALUE([Name]) OVER(PARTITION BY ID ORDER BY StartDate))  AS [Name], 
 ISNULL([Status], FIRST_VALUE([Status]) OVER(PARTITION BY ID ORDER BY StartDate))  AS [Status], 
 ISNULL([Group], FIRST_VALUE([Group]) OVER(PARTITION BY ID ORDER BY StartDate))   AS [Group], 
 ISNULL([Sub Group], FIRST_VALUE([Sub Group]) OVER(PARTITION BY ID ORDER BY StartDate))AS [Sub Group]
FROM (
    SELECT StartDate, 
        ISNULL(EndDate,  MAX(EndDate) OVER(PARTITION BY StartDate,ID )) AS EndDate,
        ID, Dimension, Value 
    FROM MyTable
) SRC
PIVOT(MAX( Value) FOR Dimension IN ([Name], [Status], [Group], [Sub Group])) PVT
order by ID desc, StartDate

动态版本:

DECLARE @Columns NVARCHAR(MAX) =''
DECLARE @PivotIn NVARCHAR(MAX) =''

SELECT 
  @PivotIn = CONCAT(@PivotIn ,', ', QUOTENAME(Dimension))
, @Columns = CONCAT(@Columns , ', ', 'ISNULL(',QUOTENAME(Dimension),', FIRST_VALUE(',QUOTENAME(Dimension),') OVER(PARTITION BY ID ORDER BY StartDate))  AS ',QUOTENAME(Dimension),'')
FROM ( SELECT DISTINCT Dimension FROM MyTable   ) AS X

DECLARE @SqlQuery NVARCHAR(MAX) = 'SELECT 
    StartDate,
    EndDate,
    ID ' + 
    @Columns 
    +' FROM 
    (
        SELECT StartDate, 
            ISNULL(EndDate,  MAX(EndDate) OVER(PARTITION BY StartDate,ID )) AS EndDate,
            ID, Dimension, Value 
        FROM MyTable

    ) SRC
    PIVOT(MAX( Value) FOR Dimension IN (' +  STUFF(@PivotIn,1,1,'') + ')) PVT
    ORDER BY ID DESC, StartDate'


EXEC sp_executesql @SqlQuery