我必须编写SQL来缩小错误的历史数据,如下所示:
K1 K2 D1 D2 start_date End_date
1 2 A B 04-08-2018 05-08-2018
1 2 A B 05-08-2018 06-08-2018
1 2 A B 06-08-2018 08-08-2018
3 4 P Q 04-08-2018 05-08-2018
3 4 P Q 05-08-2018 06-08-2018
3 4 P Q 06-08-2018 31-12-2018
1 2 C D 04-08-2018 05-08-2018
1 2 C D 05-08-2018 06-08-2018
1 2 C D 06-08-2018 31-12-2018
1 2 A B 08-08-2018 09-08-2018
1 2 A B 09-08-2018 10-08-2018
1 2 A B 10-08-2018 31-12-2018
其中K1和K2是我的Key列。 由于某种原因,我有重复的历史记录数据,需要修复它,但必须保持记录的出现。 在这里,我必须将连续的历史记录合并为一个,输出如下所示
K1 K2 D1 D2 start_date end_date
1 2 A B 04-08-2018 08-08-2018
3 4 P Q 04-08-2018 31-12-2018
1 2 C D 04-08-2018 31-12-2018
1 2 A B 08-08-2018 31-12-2018
(请忽略以后的日期,仅用于抽样)。
答案 0 :(得分:1)
我认为结束日期可以忽略,所以这是一个简单的差距和孤岛问题:
select k1, k2, d1, d2,
min(start_date), max(end_date)
from (select t.*,
row_number() over (partition by k1, k2 order by start_date) as seqnum,
row_number() over (partition by k1, k2, d1, d2 order by start_date) as seqnum_2
from t
) t
group by k1, k2, d1, d2, (seqnum - seqnum_2);
答案 1 :(得分:1)
下面将解决您的问题:
SELECT K1,K2,D1,D2,
-- THIS SPLITS THE PERIOD BACK TO SEPERATE COLUMNS
BEGIN(PD) AS START_DT, NULLIF(END(PD), DATE '9999-12-31') AS END_DT
FROM
(
SELECT NORMALIZE -- THIS RETURNS YOUR NORMALIZED RESULT AS A PERIOD
K1,K2,D1,D2,
PERIOD(START_DT,COALESCE(END_DT, DATE '9999-12-31')) AS PD
FROM TEST2 WHERE START_DT < END_DT
) AS DT