雪花临时表排序

时间:2020-10-16 16:20:41

标签: sql snowflake-cloud-data-platform

我正在尝试在CTE(WITH)和临时表之间进行性能比较。我已将查询转换为已利用的Temp表,但看到一种情况,由于早期临时表执行之一被取消,因此整体运行时执行失败。 范例:

// Produces large result set
CREATE temporary table tt1 as
SELECT DISTINCT
    t1.c1
  FROM t1
  //ADDITIONAL LOGIC
  ;

CREATE temporary table tt2 as
SELECT t2.c1
    t2.c2
FROM t2;

CREATE temporary table tt3 as
SELECT t1.c1
FROM t1
INNER JOIN tt1
 on t1.c1 = tt1.c1;

当tt3开始运行时,执行似乎被取消,从tt3返回到tt1有一个依赖项。 我已经尝试过CTAS以及使用标准CREATE TABLE定义表,然后执行INSERT INTO,但结果是执行被取消。

是否有一种方法可以通过标准sql语法更好地对这些序列进行排序,而不必走存储过程或任务的路线?

编辑[2] 这是使用行为相同的COVID数据集的更简单方案。两种情况下的终止时间似乎都是35秒左右。因此,出于复制目的,如果您可以生成一个运行2分钟的查询,然后创建一个简单的后续临时表,该表将从您应该能够复制的第一个表中进行选择。

CREATE TEMPORARY TABLE PLATFORM_EVALUATIONS.PUBLIC.TT1 AS
select a.COUNTRY_REGION
    , a.PROVINCE_STATE
    , a.COUNTY
    , current_date() c_date
    , count(*) counts
from PUBLIC.JHU_COVID_19_TIMESERIES a
inner join public.jhu_dashboard_covid_19_global b
 on a.country_region = b.country_region
where 1=1
and a.COUNTRY_REGION = 'United States'
group by a.COUNTRY_REGION
    , a.PROVINCE_STATE
    , a.COUNTY;
    
CREATE TEMPORARY TABLE PLATFORM_EVALUATIONS.PUBLIC.TT2 AS
SELECT COUNTRY_REGION
    , COUNTS
FROM PLATFORM_EVALUATIONS.PUBLIC.TT1;

编辑1Work History Screenshot 您可以在屏幕快照中看到_valid_barcodes(tt1)的持续时间缩短了(通常独立地为2m 14s),并且在完成/终止操作之前就开始执行其他步骤。

// this is tt1 in my example it produces 3,657,599 rows in 2m 14s but is cancelled short when run as a whole.
CREATE temporary table business_vault_dev.dnr._valid_barcodes as
  SELECT DISTINCT
    e.Barcode
  FROM BUSINESS_VAULT_DEV.DNR.DIM_EVENT_VW e
  INNER JOIN BUSINESS_VAULT_DEV.DNR.DIM_PACKAGE_VW p
    on e.BARCODE = COALESCE(p.Barcode, p.Barcode)
  INNER JOIN BUSINESS_VAULT_DEV.DNR.DIM_ORDER_VW o
    on p.order_key = o.order_key
  INNER JOIN BUSINESS_VAULT_DEV.SCORECARD.DIM_CUSTOMER_VW cl
    on o.CUSTOMER_ID=cl.CUSTOMER_ID
  WHERE e.Event_Type='Delivered'
    and o.service_code not in ('XD', 'PU')
    and lower(cl.NAME) IN ('a', 'b', 'c')
  GROUP BY e.Barcode
  HAVING date(min(e.EST_Event_DateTime)) between '2020-09-05' and '2020-10-02';

// this is tt2 
CREATE temporary table business_vault_dev.dnr._driver_merge as 
    select 
//redacted
    from data_lake.driver.driver_table1
    union
    select 
//redacted
    from data_lake.driver.driver_table2
    union
    select 
//redacted
    from data_lake.driver.driver_table3
    UNION
    select 
//redacted
    from data_lake.driver.driver_table4;

// this is unreferenced in my example but completes successfully and references the previous temp table    
CREATE temporary table business_vault_dev.dnr._driver as 
    select 
// redacted
    from business_vault_dev.dnr._driver_merge dm
    group by dm.contractor_number;

// this is tt3 in my example
CREATE temporary table business_vault_dev.dnr._slef_dedup as 
    SELECT
          e.Barcode
    // redacting columns here
    FROM BUSINESS_VAULT_DEV.DNR.DIM_EVENT_VW e
    INNER JOIN business_vault_dev.dnr._valid_barcodes vb
     ON e.Barcode = vb.Barcode
    LEFT JOIN data_lake.driver.driver_table1 cn
     ON //redacted
    LEFT JOIN data_lake.driver.driver_table2 pcp
     ON //redacted
;

1 个答案:

答案 0 :(得分:0)

Daniel,我可以使用带有COVID数据库的简单方案在测试中重现该问题。新版UI似乎有问题。它开始运行第二个CREATE TABLE命令,而没有完成第一个命令。请提交门票到雪花支持。

提交此票证时生成并共享HAR文件,并包含要复制的SQL,将很有用。

https://community.snowflake.com/s/article/How-to-generate-a-HAR-file-for-Web-Interface-Troubleshooting