我正在构建一个查询,以通过事件通过平台跟踪用户的生命周期。表 EVENTS
有 3 列 USER_ID
、DATE_TIME
和 EVENT_NAME
。下面是表格的快照,
下面是我的查询,
SELECT * FROM EVENTS
MATCH_RECOGNIZE
( PARTITION BY USER_ID
ORDER BY DATE_TIME
MEASURES MIN(IFF(EVENT_NAME = 'registration new', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
MIN(IFF(EVENT_NAME = 'registration pending confirm', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
MIN(IFF(EVENT_NAME = 'your business information', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
MIN(IFF(EVENT_NAME = 'your personal information', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
MIN(IFF(EVENT_NAME = 'qualified', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
ONE ROW PER MATCH
PATTERN(STEP_1 ANYTHING* STEP_5)
DEFINE
STEP_1 AS EVENT_NAME = 'registration new',
STEP_2 AS EVENT_NAME = 'registration pending confirm',
STEP_3 AS EVENT_NAME = 'your business information',
STEP_4 AS EVENT_NAME = 'your personal information',
STEP_5 AS EVENT_NAME = 'qualified'
)
我的预期结果,
我现在得到的,
以下是我的要求/警告,
REGISTRATION_PENDING_CONFIRM_TIMESTAMP
和 QUALIFIED_TIMESTAMP
列中的值。USER_ID
54321 没有/跳过事件“您的个人信息”,则结果必须包含其余步骤的数据(现在,如果用户没有/跳过漏斗中的任何事件,查询不返回任何数据)。我觉得这是因为当用户流中缺少定义为度量的事件时,模式搜索失败。表中事件的顺序不一致,所以我根据业务/漏斗逻辑在度量部分按顺序定义了事件
答案 0 :(得分:0)
这不是一个完整的答案,但至少我在这里帮助定义了示例数据(比截图更好),并介绍了 CLASSIFIER
的用法:
create or replace temp table events as
select $1 user_id, $2 date_time, $3 event_name
from values(1,'2020-11-26 15:24:00','registration new')
, (1,'2021-04-12 18:00:00','registration new')
, (1,'2020-11-26 15:24:00','registration pending confirm')
, (1,'2021-04-12 18:11:00','registration pending confirm')
, (1,'2021-04-18 15:04:00','your personal information')
, (1,'2021-04-22 13:13:00','your personal information')
, (1,'2021-04-13 10:22:00','qualified')
, (1,'2021-04-22 13:13:00','qualified')
;
SELECT * FROM EVENTS
MATCH_RECOGNIZE
( PARTITION BY USER_ID
ORDER BY DATE_TIME
MEASURES classifier as class, MIN(IFF(CLASSIFIER = 'STEP_1', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
MIN(IFF(CLASSIFIER = 'STEP_2', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
MIN(IFF(CLASSIFIER = 'STEP_3', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
MIN(IFF(CLASSIFIER = 'STEP_4', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
MIN(IFF(CLASSIFIER = 'STEP_5', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
ONE ROW PER MATCH
-- all rows per match
PATTERN((step_1 | step_2 | step_3 | step_4 | step_5 | coincidence)*)--(STEP_2 | XX)* (STEP_3 | XXX)* (STEP_4 | XX)* (STEP_5 | XX)*)
DEFINE
STEP_1 AS EVENT_NAME = 'registration new',
STEP_2 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'registration pending confirm' ,
STEP_3 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your business information',
STEP_4 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your personal information',
STEP_5 AS EVENT_NAME = 'qualified'
, COINCIDENCE AS LAG(DATE_TIME) = DATE_TIME
);