雪花不确定查询时间比较

时间:2021-03-02 19:39:41

标签: time snowflake-cloud-data-platform

我想计算与所选中上个月比较的损耗。 在这种情况下,我想知道从 2020-12-01 到 2021-01-01 丢失的客户。

SELECT  DISTINCT("Client Ref"),"Zone","Market Place","Report Period"
FROM    "REPORT_DB"."PBI"."Revenue" d
WHERE   "Report Period" ='2020-12-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL 
AND
NOT EXISTS
        (
        SELECT  "Client Ref"
        FROM    "REPORT_DB"."PBI"."Revenue" t
        WHERE   "Report Period" ='2021-01-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL AND d."Client Ref"=t."Client Ref"
        )

这是正确的检索方式吗?

问候。

1 个答案:

答案 0 :(得分:1)

因此,通过添加带有一些虚拟数据的 CTE,并将列名更改为全部安全

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
AND
NOT EXISTS
        (
        SELECT  t.Client_Ref
        FROM    data t
        WHERE   t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND t.Client_Ref IS NOT NULL AND d.Client_Ref=t.Client_Ref
        );

您用于 SQL 工作和返回的基本表单:

CLIENT_REF  ZONE    MARKET_PLACE    REPORT_PERIOD
2           a       UK              2020-12-01

这是预期的结果。

此查询是相关子查询,Snowflake 对其支持有限。因此,虽然这有效,但当您更改查询时,它可能会遇到 Unsupported subquery type cannot be evaluated 错误,请参阅 SO correlated sub-query question

可以使用 LEFT JOINWHERE x IS NULL 模式以不相关的形式编写基本查询:

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
LEFT JOIN data AS t
    ON t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
AND t.Client_Ref IS NULL;

如果您的数据源有很多行不在目标结果范围内,则可以重写以先进行一些过滤,如下所示:

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
), wanted_data AS (
    SELECT DISTINCT Client_Ref, zone, Market_Place, Report_Period
    FROM data
    WHERE Report_Period BETWEEN '2020-12-01' AND '2021-01-01'
    AND Market_Place ='UK' AND Client_Ref IS NOT NULL
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM wanted_data AS d
LEFT JOIN wanted_data AS t
    ON t.Report_Period ='2021-01-01'AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01' 
AND t.Client_Ref IS NULL;

但在我的一生中,如果我像您一样将列命名为 "Client Ref",我的 SQL 将不起作用,因此我无法回答该部分,但这是您构建 SQL 的方式。