我对某些维度(即cnt和cnt_id)有明智的数据(num) 我想插入日期,维度(即cnt和cnt_id)以及cumulative_num
我的输入设置只包含3个日期的数据,而且我有固定的日期范围,我想做间隙填充
固定日期范围= 2017-01-01至2017-01-08
参考。 SQL生成数据
WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
SELECT * FROM temp_data ORDER BY cnt_id, e_date
我的输入数据如下
e_date cnt cnt_id numbers cumulative_num
---------- --- ------ ------- --------------
2017-01-03 uk 1 10 10
2017-01-05 uk 1 20 30
2017-01-07 uk 1 40 70
2017-01-03 fr 2 100 100
2017-01-05 fr 2 200 300
2017-01-07 fr 2 500 800
... .. .. .. ...
我的预期结果如下
e_date cnt cnt_id num cumulative_num
---------- --- ------ --- --------------
2017-01-01 uk 1 0 0
2017-01-02 uk 1 0 0
2017-01-03 uk 1 10 10
2017-01-04 uk 1 0 10
2017-01-05 uk 1 20 30
2017-01-06 uk 1 0 30
2017-01-07 uk 1 40 70
2017-01-08 uk 1 0 70
2017-01-01 fr 2 0 0
2017-01-02 fr 2 0 0
2017-01-03 fr 2 100 100
2017-01-04 fr 2 0 100
2017-01-05 fr 2 200 300
2017-01-06 fr 2 0 300
2017-01-07 fr 2 500 800
2017-01-08 fr 2 0 800
注意:我正在标记postgresql和vertica,因为它们都遵循几乎相同的sql语法标准。任何数据库中的解决方案都是可取的。
答案 0 :(得分:0)
我认为这就是你要找的东西 - 准确地给出你想要的输出 - 至少你可以用它作为你的查询的起点。因为我认为实际上计算的cumulative_num不是从临时数据中计算的:
WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
select e_date, cnt, cnt_id, numbers, max(cumulative_num) over (partition by cnt_id order by e_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_num
from (
SELECT t.my_date::date as e_date, c.cnt, c.cnt_id, coalesce(tmp.numbers,0) as numbers, coalesce(tmp.cumulative_num, 0) as cumulative_num
FROM generate_series('2017-01-01'::date, '2017-01-08'::date, '1day'::interval) as t(my_date)
cross join (select distinct cnt, cnt_id from temp_data) c
left join temp_data tmp on t.my_date=tmp.e_date and c.cnt_id=tmp.cnt_id
ORDER BY cnt_id, e_date
) src