SQL添加从最新日期到当前日期的缺失日期

时间:2020-04-25 06:01:33

标签: sql snowflake-cloud-data-platform

我有一个货币表,该表每月填写一次。因此,每月的3号都会更新上个月的数据。

我想要的是拥有从最新日期到当前日期的条目。 如果找到某个特定组合的最后一个条目是在2020年2月1日,那么我需要填写货币表,以便从2月1日到该组合的当前日期都有条目, 组合基本上是-货币类型,从货币到货币。

我知道Lead()函数是实现此目标的关键,但我无法理解其使用方式。

在下面举一个小例子:

现有条目:

timestamp type fcurr tcurr Conv_ratio
20200101   M    EUR   USD    0.9
20200201   M    EUR   USD    0.85
20200229   C    INR   EUR    0.4

预期输出:

timestamp type fcurr tcurr Conv_ratio
20200101   M    EUR   USD    0.9
20200201   M    EUR   USD    0.85
20200202   M    EUR   USD    0.85
20200203   M    EUR   USD    0.85
.
.
.
20200424   M    EUR   USD    0.85
20200229   C    INR   EUR    0.4
20200301   C    INR   EUR    0.4
20200302   C    INR   EUR    0.4
.
.
20200424   C    INR   EUR    0.4

我正在寻找理想的输出,如下所示,但是即使我无法获得,上面的输出对我也很好。

timestamp type fcurr tcurr Conv_ratio
20200101   M    EUR   USD    0.9
20200201   M    EUR   USD    0.85
20200301   M    EUR   USD    0.85
20200401   M    EUR   USD    0.85
20200229   C    INR   EUR    0.4
20200331   C    INR   EUR    0.4

基本上,类型M的每个月的第一号,类型C的每个月的最后一个。

2 个答案:

答案 0 :(得分:1)

我假设只有两种货币,但似乎可以有更多种。因此,我更新了查询以处理所有货币。这是我的示例表:

L

这是我的解决方案,它不使用任何临时表或存储过程,而仅使用SQL:

create or replace table currencies (ctimestamp varchar, ctype varchar, 
fcurr varchar, tcurr varchar, conv_ration number(3,2) )
as select * from values  
('20200101' ,  'M' , 'EUR' , 'USD'  ,  0.9),
('20200201' ,  'M' , 'EUR' ,  'USD' ,   0.85),
('20200229' ,  'C' , 'INR' ,  'EUR' ,   0.4);

首先,我生成日期,然后仅选择每个月的第一天和最后一天。我按每种货币和ctype对现有数据进行分组。我将这些组与可用日期合并在一起,并与现有值合并。我使用LAG函数填充缺少的值。

以下是上述查询的结果:

with generated_days as 
(select to_varchar( dateadd( days, row_number() over (order by 1) - 1, (select min(to_date(ctimestamp,'YYYYMMDD')) from currencies)), 'YYYYMMDD' ) gtimestamp 
from table(generator(rowcount => 10000))
),
first_last_days as
(select gtimestamp, IFF(substring(gtimestamp, -2 ) = '01', 'M', 'C' ) mc
from generated_days where ( substring(gtimestamp, -2 ) = '01' or last_day( to_date(gtimestamp,'YYYYMMDD')) = to_date(gtimestamp,'YYYYMMDD'))   
and to_date(gtimestamp,'YYYYMMDD') <= current_date
),
curtypes as 
(select ctype || fcurr || tcurr pkey, ctype, max(ctimestamp) max_date from currencies group by ctype || fcurr || tcurr, ctype),
combination as (
select pkey, gtimestamp, null ctype, null fcurr, null tcurr, null conv_ration
from first_last_days
join curtypes on gtimestamp > max_date and ctype = mc 
union all
select ctype || fcurr || tcurr pkey, * from currencies
order by pkey, gtimestamp)
select  
gtimestamp, left( pkey, 1 ) ctype,
NVL(fcurr, lag( fcurr, 1, NULL ) ignore nulls over (partition by pkey order by gtimestamp )) fcurr,
NVL(tcurr, lag( tcurr, 1, NULL ) ignore nulls over (partition by pkey order by gtimestamp )) tcurr,
NVL(conv_ration, lag( conv_ration, 1, NULL ) ignore nulls over (partition by pkey order by gtimestamp)) conv_ration
from combination
order by ctype, fcurr, tcurr, gtimestamp;

答案 1 :(得分:0)

假设您的表有一个id列,并被命令该解决方案在MySQL中可以通过一个查询工作。

https://www.db-fiddle.com/f/owwJs6APbzncw9Vq9RzzTM/0

WITH  just_a_big_table AS (
SELECT t1.*,@x:=-1
FROM
    (
        SELECT 1 AS Number 
        UNION ALL SELECT 2
        UNION ALL SELECT 3
        UNION ALL SELECT 4
        UNION ALL SELECT 5
        UNION ALL SELECT 6
        UNION ALL SELECT 7
        UNION ALL SELECT 8
        UNION ALL SELECT 9
        UNION ALL SELECT 10
    ) AS t1
    CROSS JOIN (
        SELECT 1 AS Number 
        UNION ALL SELECT 2
        UNION ALL SELECT 3
        UNION ALL SELECT 4
        UNION ALL SELECT 5
        UNION ALL SELECT 6
        UNION ALL SELECT 7
        UNION ALL SELECT 8
        UNION ALL SELECT 9
        UNION ALL SELECT 10
    ) AS t2
    CROSS JOIN (
        SELECT 1 AS Number 
        UNION ALL SELECT 2
        UNION ALL SELECT 3
        UNION ALL SELECT 4
        UNION ALL SELECT 5
        UNION ALL SELECT 6
        UNION ALL SELECT 7
        UNION ALL SELECT 8
        UNION ALL SELECT 9
        UNION ALL SELECT 10
    ) AS t3

),
all_dates AS (SELECT date_format(DATE_SUB(NOW(), INTERVAL @x:=(@x+1) DAY),'%Y%m%d') dt FROM just_a_big_table),
min_max_dates AS(
     SELECT MIN(dt) AS min_date,MAX(dt) AS max_date FROM (  
     SELECT *,LEFT(dt,6) AS month FROM all_dates
     ) t GROUP BY month )
SELECT * FROM (
SELECT IF(type='M',m.min_date,m.max_date)AS timestamp,d.type,d.fcurr,d.tcurr,d.Conv_ratio FROM `min_max_dates` m INNER JOIN `data` d 
ON (m.min_date=d.timestamp AND d.type='M')
OR (m.max_date=d.timestamp AND d.type='C')
OR (m.min_date>d.timestamp AND d.type='M' AND d.id IN (SELECT MAX(id) FROM `data` GROUP BY type,fcurr,tcurr)) 
OR (m.max_date>d.timestamp AND d.type='C' AND d.id IN (SELECT MAX(id) FROM `data` GROUP BY type,fcurr,tcurr)) 
ORDER BY Conv_ratio DESC,timestamp ASC ) t WHERE timestamp!=date_format(NOW(),'%Y%m%d');

我首先创建一个10000行的虚拟表。

然后以表格中的日期格式填写从当前日期开始的日期,每次减去一天。

然后我只保留每个月的第一天和最后一天。

最后,我将此表与数据表结合在一起(删除所有创建的当前日期行,因为当前日期是该月的最大日期)