在 Snowflake 中使用 SQL 进行漏斗分析

时间:2021-05-20 19:16:07

标签: sql snowflake-cloud-data-platform match-recognize

我正在构建一个查询,以通过事件通过平台跟踪用户的生命周期。表 EVENTS 有 3 列 USER_IDDATE_TIMEEVENT_NAME。下面是表格的快照,

enter image description here

下面是我的查询,

SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
    MEASURES MIN(IFF(EVENT_NAME = 'registration new', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'registration pending confirm', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your business information', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your personal information', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'qualified', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
  ONE ROW PER MATCH
  PATTERN(STEP_1 ANYTHING* STEP_5)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS EVENT_NAME = 'registration pending confirm',
        STEP_3 AS EVENT_NAME = 'your business information',
        STEP_4 AS EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
)

我的预期结果,

enter image description here enter image description here enter image description here

我现在得到的,

enter image description here enter image description here enter image description here

以下是我的要求/警告,

  • 下一个事件的时间戳应大于或等于前一个事件的时间戳(以先到者为准,以便通过漏斗的事件的时间戳相等或不断增加)。这个逻辑的一个很好的例子可以用当前和预期结果的差异来解释,即 REGISTRATION_PENDING_CONFIRM_TIMESTAMPQUALIFIED_TIMESTAMP 列中的值。
  • 并非所有用户都有这 5 个事件,例如,如果 USER_ID 54321 没有/跳过事件“您的个人信息”,则结果必须包含其余步骤的数据(现在,如果用户没有/跳过漏斗中的任何事件,查询不返回任何数据)。我觉得这是因为当用户流中缺少定义为度量的事件时,模式搜索失败。

表中事件的顺序不一致,所以我根据业务/漏斗逻辑在度量部分按顺序定义了事件

1 个答案:

答案 0 :(得分:0)

这不是一个完整的答案,但至少我在这里帮助定义了示例数据(比截图更好),并介绍了 CLASSIFIER 的用法:

create or replace temp table events as
select $1 user_id, $2 date_time, $3 event_name
from values(1,'2020-11-26 15:24:00','registration new')
, (1,'2021-04-12 18:00:00','registration new')
, (1,'2020-11-26 15:24:00','registration pending confirm')
, (1,'2021-04-12 18:11:00','registration pending confirm')
, (1,'2021-04-18 15:04:00','your personal information')
, (1,'2021-04-22 13:13:00','your personal information')
, (1,'2021-04-13 10:22:00','qualified')
, (1,'2021-04-22 13:13:00','qualified')
;


SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
 
    MEASURES  classifier as class, MIN(IFF(CLASSIFIER = 'STEP_1', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_2', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_3', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_4', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_5', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
 
  ONE ROW PER MATCH
 -- all rows per match
  PATTERN((step_1 | step_2 | step_3 | step_4 | step_5 | coincidence)*)--(STEP_2 | XX)* (STEP_3 | XXX)* (STEP_4 | XX)* (STEP_5 | XX)*)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'registration pending confirm' ,
        STEP_3 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your business information',
        STEP_4 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
        , COINCIDENCE AS LAG(DATE_TIME) = DATE_TIME
);
相关问题