Snowflake Windows分析功能可设置分组集

时间:2019-11-21 15:52:07

标签: snowflake-data-warehouse snowflake-schema

我为data lake设置了以下数据集,该数据集用作Dimension的源,我想在Dimension中迁移历史记录数据

例如:image

Primarykey       Checksum     DateFrom     Dateto      ActiveFlag 
  1                  11         01:00       03:00         False
  1                  22         03:00       05:00         False 
  1                  22         05:00       07:00         False
  1                  11         07:00       09:00         False
  1                  11         09:00    12/31/999         TRUE

请注意,datalake表具有多个列,这些列不属于维,因此我们将重新计算检查显示相同的值,但datefromdateto

with base as (
Select 
   Primary_key,
   checksum,
   first_value ( datefrom ) over ( partition by Primary_key ,checksum order by datefrom ) as Datefrom,
   last_value ( dateto ) over ( partition by Primary_key  ,checksum order by datefrom ) as Dateto,
   rownumber () over ( partition by Primary_key  ,checksum order by datefrom ) as latest_record 
from Datalake.user)
select * from base where latest_record = 1

数据显示为

Primarykey       Checksum     DateFrom     Dateto 
   1              11           01:00         12/31/999 
   1              22           03:00         07:00

但预期是

Primarykey       Checksum     DateFrom     Dateto 
   1              11           01:00         03:00 
   1              22           03:00         07:00
   1              11           07:00         12/31/999 

我在单个查询中尝试了多种方式,但是有什么好的建议吗?

3 个答案:

答案 0 :(得分:0)

之所以只得到两行,是因为分区Primarykeychecksum中有两列,而它们只有两个组合。期望输出中所需的行与期望输出中的第一行具有相同的Primarykeychecksum(1,11)。

如果您将ActiveFlag包括在分区中,那么我在数据中看到的会带来结果的东西。

WITH base AS (
    SELECT 
       primary_key,
       checksum,
       FIRST_VALUE (datefrom) OVER ( PARTITION BY primary_key, checksum, active_flag order by datefrom) AS datefrom,
       LAST_VALUE (dateto) OVER ( partition BY primary_key, checksum, active_flag order by datefrom) AS dateto,
       ROWNUMBER () OVER ( partition BY primary_key, checksum, active_flag order by datefrom) AS latest_record 
    FROM Datalake.user
)
SELECT * FROM base WHERE latest_record = 1

答案 1 :(得分:0)

尝试此代码。应该在Snowflake和Oracle中都可以使用: 如果校验和按日期更改顺序,则创建一个单独的组

**SNOWFLAKE**:
WITH base AS (
SELECT 
Primarykey,
   checksum,
   FIRST_VALUE( datefrom ) OVER ( PARTITION BY Primarykey ,checksum,checksum_group     ORDER BY datefrom ) AS Datefrom,
   LAST_VALUE( dateto ) OVER ( PARTITION BY Primarykey  ,checksum,checksum_group     ORDER BY datefrom ) AS Dateto,
   ROW_NUMBER() over ( PARTITION BY Primarykey  ,checksum,checksum_group ORDER BY     datefrom ) AS latest_record 
FROM(   
SELECT 
Primarykey,
   checksum,
   checksum_prev,
   datefrom,
   dateto,
   LAST_VALUE((case when checksum<>checksum_prev THEN group1 END)) IGNORE NULLS OVER     (
  ORDER BY group1
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) checksum_group
 FROM (
SELECT 
   Primarykey,
   checksum,
   datefrom,
   dateto,
   LAG(checksum, 1, 0) OVER (ORDER BY datefrom) AS checksum_prev,
   LPAD(1000 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), 4, 0) as group1
FROM Datalake.user)
)
) 
SELECT * FROM base WHERE latest_record = 1

**Oracle**:
WITH base AS (
SELECT 
Primarykey,
   checksum,
   FIRST_VALUE ( datefrom ) OVER ( partition by Primarykey ,checksum,checksum_group     order by datefrom ) AS Datefrom,
   LAST_VALUE ( dateto ) OVER ( partition by Primarykey  ,checksum,checksum_group     order by datefrom ) AS Dateto,
   ROW_NUMBER() OVER ( PARTITION BY Primarykey  ,checksum,checksum_group ORDER BY     datefrom ) AS latest_record 
FROM(   
SELECT 
Primarykey,
   checksum,
   checksum_prev,
   datefrom,
   dateto,
   LAST_VALUE((CASE WHEN checksum<>checksum_prev THEN group1 END)) IGNORE NULLS 
   OVER (ORDER BY group1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)     checksum_group
 FROM (
SELECT 
   Primarykey,
   checksum,
   datefrom,
   dateto,
   LAG(checksum, 1, 0) OVER (ORDER BY DATEFROM) AS checksum_prev,
   LPAD(1000 + ROWNUM, 4, 0) as group1
FROM Datalake.user))) 
SELECT * FROM base WHERE latest_record = 1

答案 2 :(得分:0)

我调整了查询​​,使其可以在整个数据集上使用。 由于缺少主键,整个数据都失败了。 修改后的工作查询

enter image description here