如何根据postgresql / vertica中的日期范围进行间隙填充和插值?

时间:2018-01-25 10:44:32

标签: sql postgresql vertica

我对某些维度(即cnt和cnt_id)有明智的数据(num) 我想插入日期,维度(即cnt和cnt_id)以及cumulative_num

我的输入设置只包含3个日期的数据,而且我有固定的日期范围,我想做间隙填充

固定日期范围= 2017-01-01至2017-01-08

参考。 SQL生成数据

WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
SELECT * FROM temp_data ORDER BY cnt_id, e_date

我的输入数据如下

e_date     cnt cnt_id numbers cumulative_num 
---------- --- ------ ------- -------------- 
2017-01-03 uk  1      10      10             
2017-01-05 uk  1      20      30             
2017-01-07 uk  1      40      70             
2017-01-03 fr  2      100     100            
2017-01-05 fr  2      200     300            
2017-01-07 fr  2      500     800            
...        ..  ..     ..      ...            

我的预期结果如下

 e_date     cnt cnt_id num cumulative_num 
---------- --- ------ --- -------------- 
2017-01-01 uk  1      0   0              
2017-01-02 uk  1      0   0              
2017-01-03 uk  1      10  10             
2017-01-04 uk  1      0   10             
2017-01-05 uk  1      20  30             
2017-01-06 uk  1      0   30             
2017-01-07 uk  1      40  70             
2017-01-08 uk  1      0   70             
2017-01-01 fr  2      0   0              
2017-01-02 fr  2      0   0              
2017-01-03 fr  2      100 100            
2017-01-04 fr  2      0   100            
2017-01-05 fr  2      200 300            
2017-01-06 fr  2      0   300            
2017-01-07 fr  2      500 800            
2017-01-08 fr  2      0   800     

注意:我正在标记postgresql和vertica,因为它们都遵循几乎相同的sql语法标准。任何数据库中的解决方案都是可取的。

1 个答案:

答案 0 :(得分:0)

我认为这就是你要找的东西 - 准确地给出你想要的输出 - 至少你可以用它作为你的查询的起点。因为我认为实际上计算的cumulative_num不是从临时数据中计算的:

WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
select e_date, cnt, cnt_id, numbers, max(cumulative_num) over (partition by cnt_id order by e_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_num
from (
SELECT t.my_date::date as e_date, c.cnt, c.cnt_id, coalesce(tmp.numbers,0) as numbers, coalesce(tmp.cumulative_num, 0) as cumulative_num 
FROM generate_series('2017-01-01'::date, '2017-01-08'::date, '1day'::interval) as t(my_date)
cross join (select distinct cnt, cnt_id from temp_data) c
left join temp_data tmp on t.my_date=tmp.e_date and c.cnt_id=tmp.cnt_id
ORDER BY cnt_id, e_date
) src