类型2缓慢变化的尺寸转换(Teradata 14.10)

时间:2017-01-10 21:34:10

标签: sql relational-database teradata

我有一张像

这样的表格
KeyField DeltaField SomeField Row_Ins_Ts
1        a          1         '2016-01-01 00:00:00'
1        a          2         '2016-01-02 00:00:00'
1        b          3         '2016-01-03 00:00:00'         
1        a          4         '2016-01-04 00:00:00'
2        d          5         '2016-01-01 00:00:00'
2        d          6         '2016-01-02 00:00:00'
2        e          7         '2016-01-03 00:00:00'
2        e          8         '2016-01-04 00:00:00'

我需要得到给定KeyField的DeltaField的每个值当前的间隔。

上述数据集的结果集为:

KeyField DeltaField Rec_Strt_Ts            Rec_End_Ts
1        a          '2016-01-01 00:00:00'  '2016-01-02 23:59:59'
1        b          '2016-01-03 00:00:00'  '2016-01-03 23:59:59'      
1        a          '2016-01-04 00:00:00'  '9999-12-31 23:59:59' 
2        d          '2016-01-01 00:00:00'  '2016-01-02 23:59:59'
2        e          '2016-01-03 00:00:00'  '9999-12-31 23:59:59' 

2 个答案:

答案 0 :(得分:1)

我正在尝试使用Vertica而不是Teradata,但幸运的是它们支持ANSI 99标准:窗口函数,也称为OLAP或分析函数,以及具有多个相互依赖的全局表表达式的WITH子句。

第一个全局表表达式只是生成示例数据;第二个属于查询,因为您通常不能将OLAP函数放入WHERE子句中,因此需要在子选择中获取它。并且LAG()OLAP函数是您需要的,以便能够过滤掉与其前任具有相同deltafield的行。因此,在最终查询中,我可以过滤deltafield <> prev_deltafield并使用LEAD()OLAP函数并从中减去一秒以获得结束时间戳。我使用IFNULL()函数来满足我没有LEAD()或LAG()值的情况。 IFNULL()的同义词可以是NVL()或VALUE()。 COALESCE()也可以工作,但它的速度较慢,因为它有不同数量的参数。见这里:

WITH foo(keyfield,deltafield,somefield,row_ins_ts) AS (
          SELECT 1,'a',1, TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 1,'a',2, TIMESTAMP '2016-01-02 00:00:00'
UNION ALL SELECT 1,'b',3, TIMESTAMP '2016-01-03 00:00:00'
UNION ALL SELECT 1,'a',4, TIMESTAMP '2016-01-04 00:00:00'
UNION ALL SELECT 2,'d',5, TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 2,'d',6, TIMESTAMP '2016-01-02 00:00:00'
UNION ALL SELECT 2,'e',7, TIMESTAMP '2016-01-03 00:00:00'
UNION ALL SELECT 2,'e',8, TIMESTAMP '2016-01-04 00:00:00'
)
,    add_previous_deltafield AS (
SELECT
  keyfield
, deltafield
, LAG(deltafield) OVER(PARTITION BY keyfield ORDER BY row_ins_ts) AS prev_deltafield
, row_ins_ts
FROM foo
)
SELECT
  keyfield
, deltafield
, row_ins_ts AS rec_start_ts
, IFNULL(
    LEAD(row_ins_ts) OVER(
      PARTITION BY keyfield ORDER BY row_ins_ts
    ) - INTERVAL '1 SECOND'
  , '9999-12-31 23:59:59'
  ) AS rec_end_ts
FROM add_previous_deltafield
WHERE deltafield <> IFNULL(prev_deltafield,'')
ORDER BY keyfield;

快乐的比赛 - Marco the Sane

答案 1 :(得分:1)

作为Window函数的替代品(因为它们是可移植的,它们很好),您可以使用Teradata内置的Period逻辑以及一些内置函数来快速解决这个问题。

基本上这将分为三个部分:

  1. 将时间戳转换为句点(时间戳)。 Teradata中的期间类型具有起点和终点,并采用Period(<begindate>, <enddate>)形式并使用日期或时间戳。
  2. 使用内置函数TD_NORMALIZE_OVERLAP_MEET,我们可以根据一个或多个字段将重叠或会议周期的多个记录挤压在一起。
  3. 然后我们从该函数的结果开始。
  4. 在你的例子中:

    WITH subtbl(keyfield, deltafield, durations) AS
    (
        SELECT
            keyfield,
            deltafield,
            PERIOD(row_ins_ts, row_ins_ts + INTERVAL '1' DAY )  AS durations
        FROM
            <yourtable>
    ) 
    SELECT keyfield, deltafield, BEGIN(durations) 
    FROM TABLE
        (
            TD_NORMALIZE_OVERLAP_MEET (NEW VARIANT_TYPE(subtbl.keyfield, subtbl.deltafield), subtbl.durations)
            RETURNS (keyfield INTEGER, deltafield CHAR(1), durations PERIOD(TIMESTAMP(0)), numRecords INTEGER) 
            HASH BY keyfield, deltafield
            LOCAL ORDER BY keyfield, deltafield, durations
        ) AS dt(keyfield, deltafield, durations, numRecords)    
    ORDER BY 1, 2;
    

    哪个输出:

    +----------+------------+------------------+
    | keyfield | deltafield | BEGIN(durations) |
    +----------+------------+------------------+
    |        1 | a          | 1/4/2016 0:00    |
    |        1 | a          | 1/1/2016 0:00    |
    |        1 | b          | 1/3/2016 0:00    |
    |        2 | d          | 1/1/2016 0:00    |
    |        2 | e          | 1/3/2016 0:00    |
    +----------+------------+------------------+