我的表中充满了来自外部来源的事件,例如:
SELECT target_type, timer_type, event_happened_at
FROM tracked_events
ORDER BY event_happened_at ASC;
TARGET_TYPE TIMER_TYPE EVENT_HAPPENED_AT
"JOB", "START", "2018-11-06 06:00:00+00"
"JOB", "STOP", "2018-11-06 10:30:00+00"
"PAUSE", "START", "2018-11-06 10:30:00+00"
"PAUSE", "STOP", "2018-11-06 11:00:00+00"
"JOB", "START", "2018-11-06 11:00:00+00"
"JOB", "STOP", "2018-11-06 15:00:00+00"
我们可以将逻辑分组分为三行:
TYPE START END
JOB, 2018-11-06 06:00:00+00, 2018-11-06 10:00:00+00
PAUSE, 2018-11-06 10:00:00+00, 2018-11-06 11:00:00+00
JOB, 2018-11-06 11:00:00+00, 2018-11-06 15:00:00+00
我正在尝试找出一种在SQL中执行此分组的好方法。事件类型是预定义的,并保证以“逻辑方式”发送(即,在JOB / END之前不会发生PAUSE / START,并且保证所有事件都存在开始/结束)。
因此,基本上,如果我看到JOB / START,则需要查找下一个JOB / END,同样地,对于PAUSE / START到PAUSE / END也是如此。
我可以看到一个查询,其中我仅查找START事件,并执行子查询以找到其对应的END:
WITH starts AS (
SELECT session_id, target_type, timer_type, event_happened_at
FROM received_session_events
WHERE session_id = 266
AND TIMER_TYPE = 'START'
ORDER BY event_happened_at ASC
)
SELECT target_type, event_happened_at AS started_at,
(
SELECT event_happened_at
FROM received_session_events end_event
WHERE session_id = starts.session_id
AND timer_type = 'STOP' AND end_event.target_type = starts.target_type
AND end_event.event_happened_at > starts.event_happened_at
ORDER BY event_happened_at ASC
LIMIT 1
) AS ended_at
FROM starts
这可以正常工作,并且给出正确的结果,但是似乎有些效率低下。添加索引是我没有探索过的选项(除了出现在WHERE子句中的显而易见的索引之外,我也不知道从哪里开始。)
答案 0 :(得分:1)
假设数据遵循事件目标和时间的“逻辑顺序”,请考虑添加行号并运行移位的自我联接:
WITH s2 AS
(SELECT *, ROW_NUMBER() OVER() As ROW_NUM
FROM received_session_events)
SELECT CASE
WHEN s1.TARGET_TYPE = 'DEAL'
THEN 'JOB'
ELSE s1.TARGET_TYPE
END AS "TYPE", s1.EVENT_HAPPENED_AT AS "START", s2.EVENT_HAPPENED_AT AS "END"
FROM s2 AS s1
JOIN
s2
ON s1.TARGET_TYPE = s2.TARGET_TYPE AND s1.ROW_NUM = s2.ROW_NUM - 1
ORDER BY s1.EVENT_HAPPENED_AT;
-- TYPE START END
-- JOB 2018-11-06 06:00:00+00 2018-11-06 10:30:00+00
-- PAUSE 2018-11-06 10:30:00+00 2018-11-06 11:00:00+00
-- JOB 2018-11-06 11:00:00+00 2018-11-06 15:00:00+00