修复和缩小错误的历史记录

时间:2018-08-07 02:20:50

标签: sql teradata

我必须编写SQL来缩小错误的历史数据,如下所示:

      K1   K2   D1  D2  start_date  End_date
       1    2   A   B   04-08-2018  05-08-2018
       1    2   A   B   05-08-2018  06-08-2018
       1    2   A   B   06-08-2018  08-08-2018
       3    4   P   Q   04-08-2018  05-08-2018
       3    4   P   Q   05-08-2018  06-08-2018
       3    4   P   Q   06-08-2018  31-12-2018
       1    2   C   D   04-08-2018  05-08-2018
       1    2   C   D   05-08-2018  06-08-2018
       1    2   C   D   06-08-2018  31-12-2018
       1    2   A   B   08-08-2018  09-08-2018
       1    2   A   B   09-08-2018  10-08-2018
       1    2   A   B   10-08-2018  31-12-2018

其中K1和K2是我的Key列。 由于某种原因,我有重复的历史记录数据,需要修复它,但必须保持记录的出现。 在这里,我必须将连续的历史记录合并为一个,输出如下所示

      K1    K2  D1  D2  start_date  end_date
       1    2   A   B   04-08-2018  08-08-2018
       3    4   P   Q   04-08-2018  31-12-2018
       1    2   C   D   04-08-2018  31-12-2018
       1    2   A   B   08-08-2018  31-12-2018

(请忽略以后的日期,仅用于抽样)。

2 个答案:

答案 0 :(得分:1)

我认为结束日期可以忽略,所以这是一个简单的差距和孤岛问题:

select k1, k2, d1, d2,
       min(start_date), max(end_date)
from (select t.*,
             row_number() over (partition by k1, k2 order by start_date) as seqnum,
             row_number() over (partition by k1, k2, d1, d2 order by start_date) as seqnum_2
      from t
     ) t
group by k1, k2, d1, d2, (seqnum - seqnum_2);

答案 1 :(得分:1)

下面将解决您的问题:

SELECT K1,K2,D1,D2,
       -- THIS SPLITS THE PERIOD BACK TO SEPERATE COLUMNS 
       BEGIN(PD) AS START_DT, NULLIF(END(PD), DATE '9999-12-31') AS END_DT
    FROM
    ( 
       SELECT NORMALIZE -- THIS RETURNS YOUR NORMALIZED RESULT AS A PERIOD
          K1,K2,D1,D2,
          PERIOD(START_DT,COALESCE(END_DT, DATE '9999-12-31')) AS PD
       FROM TEST2  WHERE START_DT < END_DT
    ) AS DT