我正在尝试为满足特定事件序列的UserID提取记录。如果用户有JOIN,然后是后续的CANCEL,然后是后续的JOIN,我想在结果集中返回它们。我需要根据需要一次运行此查询一天,或一次运行几天。
下表显示了满足和不符合序列的UserID的示例。
+--------+--------+---------------------+------------+------------------+
| rownum | UserID | Timestamp | ActionType | Return in query? |
+--------+--------+---------------------+------------+------------------+
| 1 | 12345 | 2016-11-01 08:25:39 | JOIN | yes |
| 2 | 12345 | 2016-11-01 08:27:00 | NULL | yes |
| 3 | 12345 | 2016-11-01 08:28:20 | DOWNGRADE | yes |
| 4 | 12345 | 2016-11-01 08:31:34 | NULL | yes |
| 5 | 12345 | 2016-11-01 08:32:44 | CANCEL | yes |
| 6 | 12345 | 2016-11-01 08:45:51 | NULL | yes |
| 7 | 12345 | 2016-11-01 08:50:57 | JOIN | yes |
| 1 | 9876 | 2016-11-01 16:05:42 | JOIN | yes |
| 2 | 9876 | 2016-11-01 16:07:33 | CANCEL | yes |
| 3 | 9876 | 2016-11-01 16:09:09 | JOIN | yes |
| 1 | 56565 | 2016-11-01 18:15:16 | JOIN | no |
| 2 | 56565 | 2016-11-01 19:22:25 | CANCEL | no |
| 3 | 56565 | 2016-11-01 20:05:05 | CANCEL | no |
| 1 | 34343 | 2016-11-01 05:32:56 | JOIN | no |
+--------+--------+---------------------+------------+------------------+
我已经阅读了关于差距和岛屿的信息,并查看了各种复杂的论坛帖子,围绕着我想要实现的目标。
目前,我所能做的只是查看一天的记录,不需要对我需要的序列逻辑进行限制:
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY tmsmp) rownum
,UserID
,tmstmp
,ActionType
FROM
t
INNER JOIN (
SELECT UserID
FROM t
WHERE tmstmp BETWEEN '2016-11-20 00:00:01' AND '2016-11-20 11:59:59'
GROUP BY UserID
HAVING COUNT(*) >= 2
) AS sub ON t1.UserID = sub.UserID
感谢您的投入!
答案 0 :(得分:4)
您可以使用LEAD()
:
SELECT * FROM (
SELECT t.* ,
LAG(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION,
LAG(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION2,
LEAD(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action,
LEAD(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action2
FROM YourTable t
WHERE tmstmp BETWEEN <Start> AND <End>)
WHERE (t.actionType = 'JOIN' AND
t.NEXT_Action = 'Cancel' AND
t.NEXT_Action2 = 'JOIN')
OR (t.LAST_ACTION= 'JOIN' AND
t.actionType= 'Cancel' AND
t.NEXT_Action = 'JOIN')
OR (t.LAST_ACTION2= 'JOIN' AND
t.LAST_ACTION = 'Cancel' AND
t.actionType= 'JOIN')
答案 1 :(得分:1)
假设您的意思是记录按顺序没有间隙,只需使用lag()
,lead()
或其组合:
select distinct userId
from (select t.*,
lag(ActionType) over (partition by userId order by tmstamp) as prev_at,
lead(ActionType) over (partition by userId order by tmstamp) as next_at,
from t
) t
where ActionType = 'Cancel' and prev_at = 'Join' and next_at = 'Join';
如果允许间隙,那么您可以采用不同的方式:
select distint userid
from t
where ActionType = 'Cancel' and
exists (select 1
from t t2
where t2.userId = t.userId and
t2.at = 'Join' and
t2.tmstamp < t.tmstamp
) and
exists (select 1
from t t2
where t2.userId = t.userId and
t2.at = 'Join' and
t2.tmstamp > t.tmstamp
);
答案 2 :(得分:1)
在我的示例查询中,我会尽可能地使用您提供的信息,但您对源表的外观有点不清楚。您在上面显示了一个表(没有名称),但随后在示例查询中引用了两个不同的表...有点难以看到发生了什么。
所以我假设一个名为t
的表,您可以根据需要进行调整......
然后我将如何处理这个,首先确定用户
select distinct userid
from t first_join
inner join t cancel
on first_join.tmstmp < cancel.tmstp
and first_join.userid = cancel.userid
inner join t.second_join
on second_join.tmstmp > cancel.tmstp
and second_join.userid = cancel.userid
where first_join.actiontype = 'JOIN'
and cancel.actiontype = 'CANCEL'
and second_join.actiontype = 'JOIN'
所以现在你可以获得这些用户的所有记录
SELECT *
FROM T
WHERE USERID IN (
select distinct userid
from t first_join
inner join t cancel
on first_join.tmstmp < cancel.tmstp
and first_join.userid = cancel.userid
inner join t.second_join
on second_join.tmstmp > cancel.tmstp
and second_join.userid = cancel.userid
where first_join.actiontype = 'JOIN'
and cancel.actiontype = 'CANCEL'
and second_join.actiontype = 'JOIN'
)