雪花查询/任务被取消,原因是时间限制

时间:2020-04-01 09:53:49

标签: performance timeout snowflake-cloud-data-platform

我设置了代码,以将Google Analytics(分析)中的数据从原始GA表加载到经过修改的表中,从而提供了更多的见解。 任务无法在定义的3600秒时间内运行。 因此,该任务被取消,并且没有数据加载。

“声明达到其声明或仓库超时3,600秒,并且已被取消。” 然后,我使用固定的子句以手动方式加载它:WHERE gae."DAY"='2020-03-31'而不是 WHERE gae."DAY">=CuRRENT_DATE-1。它仍然需要花费很多时间,但最终还是有效。

如何使此查询更快?还是解决我的问题。

CREATE OR REPLACE TASK dm."Website"."x009_002_all_GA_events"
        WAREHOUSE = marketing_wh
        SCHEDULE = 'USING CRON 26 5 * * * Europe/Berlin'
        AS
    merge into DM."Website".ALL_GA_EVENTS_CLEAN target --DM."Website".ALL_GA_EVENTS target
    using (
    SELECT
    GAEVENTACTION AS GAEVENTACTION,
    GAEVENTCATEGORY AS GAEVENTCATEGORY,
    "DAY" AS Datum,
    DEVICE_TYPE AS Device,
    EVENT_COUNT AS EventCount,
    GAUNIQUEEVENTS AS Uniqueevents,
    EVENT_VALUE AS eventvalue,
    LABELS AS labelz,
    URL AS urlz,
    --split_part(LABELS,'/_',2) AS "HITSTAMP",
    CASE WHEN split_part(LABELS,'/_',2) IS NOT NULL THEN TRY_CAST(split_part(LABELS,'/_',2) AS timestamp) ELSE NULL END AS "HITSTAMP",
    split_part(LABELS,'/_',3) AS EVENT_INFO,
    split_part(LABELS,'/_',1) AS "SESSIONID",
    CASE
        WHEN CONTAINS (URL, '/checkout/')=TRUE THEN split_part(URL,'/',3)
        WHEN CONTAINS (URL, '/auto/')=TRUE THEN split_part(split_part(URL,'/',3),'?',1)
        WHEN CONTAINS (URL, '/angebote/')=TRUE THEN split_part(URL,'/',3)
        ELSE 'no vehicle'
    END AS vehicleID,
    rank() over (partition BY "SESSIONID" order by "HITSTAMP") as "RANK",
    CASE 
        WHEN (GAEVENTACTION= ('pdp_flash_offer_request' )) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE
            END
        WHEN (GAEVENTACTION= ('chat_started')) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE  
            END
        WHEN (GAEVENTACTION= ('Direct_checkout_send')) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE
            END
        WHEN (GAEVENTACTION= ('pdp_offer_request')) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE
            END
        WHEN (GAEVENTACTION= ('agent-requested')) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE 
            END
        WHEN (GAEVENTACTION= ('SERP_softlead_send')) THEN 
            CASE WHEN split_part(LABELS,'/_',2)=(SELECT min(split_part(t2.LABELS,'/_',2))
                FROM "DL_Datatap"."PUBLIC"."GA_all_events" t2
                WHERE split_part(t2.LABELS,'/_',1)=split_part(gae.LABELS,'/_',1) AND gae.GAEVENTACTION=t2.GAEVENTACTION)
            THEN TRUE
            ELSE FALSE
            END
    ELSE False
    END AS "GOAL_EVENT"
    FROM "DL_Datatap"."PUBLIC"."GA_all_events" gae
    ---WHERE gae."DAY"='2020-03-31'
    WHERE gae."DAY">=CuRRENT_DATE-1 
    ) SOURCE
    ON target.SESSIONID=SOURCE."SESSIONID" AND target.HITSTAMP=SOURCE."HITSTAMP" AND target.EVENT_ACTION=SOURCE.GAEVENTACTION AND target."Date"=SOURCE.Datum
    when NOT matched then INSERT (EVENT_ACTION, EVENT_CATEGORY, "Date", DEVICE, TOTAL_EVENTS, UNIQUE_EVENTS, EVENT_VALUE, EVENT_LABEL, URL, HITSTAMP, EVENT_INFO, SESSIONID, VEHICLEID, EVENT_SEQUENCE, GOAL_EVENT)
    VALUES (SOURCE.GAEVENTACTION, SOURCE.GAEVENTCATEGORY, SOURCE.Datum, SOURCE.Device, SOURCE.eventcount, SOURCE.Uniqueevents, SOURCE. eventvalue, SOURCE.labelz, SOURCE.urlz, SOURCE."HITSTAMP", SOURCE.EVENT_INFO, SOURCE."SESSIONID", SOURCE.vehicleid, SOURCE."RANK", SOURCE."GOAL_EVENT" 
    )

1 个答案:

答案 0 :(得分:1)

您可以通过以下方法增加任务超时限制:

CREATE OR REPLACE TASK dm."Website"."x009_002_all_GA_events"
WAREHOUSE = marketing_wh
SCHEDULE = 'USING CRON 26 5 * * * Europe/Berlin'
USER_TASK_TIMEOUT_MS = 86400000
AS
...

https://docs.snowflake.com/en/sql-reference/sql/create-task.html#optional-parameters

如果您需要调整SQL的帮助,请提交案例以获取支持。他们可以看到表的元数据以及以前的运行的执行计划,因此可以指导您调整查询,对目标表进行集群等。