如何返回满足特定事件序列的行?

时间:2016-11-23 14:19:27

标签: sql postgresql

我正在尝试为满足特定事件序列的UserID提取记录。如果用户有JOIN,然后是后续的CANCEL,然后是后续的JOIN,我想在结果集中返回它们。我需要根据需要一次运行此查询一天,或一次运行几天。

下表显示了满足和不符合序列的UserID的示例。

+--------+--------+---------------------+------------+------------------+
| rownum | UserID |      Timestamp      | ActionType | Return in query? |
+--------+--------+---------------------+------------+------------------+
|      1 |  12345 | 2016-11-01 08:25:39 | JOIN       | yes              |
|      2 |  12345 | 2016-11-01 08:27:00 | NULL       | yes              |
|      3 |  12345 | 2016-11-01 08:28:20 | DOWNGRADE  | yes              |
|      4 |  12345 | 2016-11-01 08:31:34 | NULL       | yes              |
|      5 |  12345 | 2016-11-01 08:32:44 | CANCEL     | yes              |
|      6 |  12345 | 2016-11-01 08:45:51 | NULL       | yes              |
|      7 |  12345 | 2016-11-01 08:50:57 | JOIN       | yes              |
|      1 |   9876 | 2016-11-01 16:05:42 | JOIN       | yes              |
|      2 |   9876 | 2016-11-01 16:07:33 | CANCEL     | yes              |
|      3 |   9876 | 2016-11-01 16:09:09 | JOIN       | yes              |
|      1 |  56565 | 2016-11-01 18:15:16 | JOIN       | no               |
|      2 |  56565 | 2016-11-01 19:22:25 | CANCEL     | no               |
|      3 |  56565 | 2016-11-01 20:05:05 | CANCEL     | no               |
|      1 |  34343 | 2016-11-01 05:32:56 | JOIN       | no               |
+--------+--------+---------------------+------------+------------------+

我已经阅读了关于差距和岛屿的信息,并查看了各种复杂的论坛帖子,围绕着我想要实现的目标。

目前,我所能做的只是查看一天的记录,不需要对我需要的序列逻辑进行限制:

SELECT
    ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY tmsmp) rownum
    ,UserID
    ,tmstmp
    ,ActionType
FROM
    t
    INNER JOIN  (
                SELECT UserID
                FROM t
                WHERE tmstmp BETWEEN '2016-11-20 00:00:01' AND '2016-11-20 11:59:59'
                GROUP BY UserID
                HAVING COUNT(*) >= 2
                ) AS sub ON t1.UserID = sub.UserID

感谢您的投入!

3 个答案:

答案 0 :(得分:4)

您可以使用LEAD()

SELECT * FROM (
    SELECT t.* ,
           LAG(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION,
           LAG(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION2,
           LEAD(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action,
           LEAD(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action2
    FROM YourTable t
    WHERE tmstmp BETWEEN <Start> AND <End>)
WHERE (t.actionType = 'JOIN' AND
      t.NEXT_Action = 'Cancel' AND
      t.NEXT_Action2 = 'JOIN')
  OR (t.LAST_ACTION= 'JOIN' AND
      t.actionType= 'Cancel' AND
      t.NEXT_Action = 'JOIN')
  OR (t.LAST_ACTION2= 'JOIN' AND
      t.LAST_ACTION = 'Cancel' AND
      t.actionType= 'JOIN')

答案 1 :(得分:1)

假设您的意思是记录按顺序没有间隙,只需使用lag()lead()或其组合:

select distinct userId
from (select t.*,
             lag(ActionType) over (partition by userId order by tmstamp) as prev_at,
             lead(ActionType) over (partition by userId order by tmstamp) as next_at,
      from t
     ) t
where ActionType = 'Cancel' and prev_at = 'Join' and next_at = 'Join';

如果允许间隙,那么您可以采用不同的方式:

select distint userid
from t
where ActionType = 'Cancel' and
      exists (select 1
              from t t2
              where t2.userId = t.userId and
                    t2.at = 'Join' and
                    t2.tmstamp < t.tmstamp
             ) and
      exists (select 1
              from t t2
              where t2.userId = t.userId and
                    t2.at = 'Join' and
                    t2.tmstamp > t.tmstamp
             );

答案 2 :(得分:1)

在我的示例查询中,我会尽可能地使用您提供的信息,但您对源表的外观有点不清楚。您在上面显示了一个表(没有名称),但随后在示例查询中引用了两个不同的表...有点难以看到发生了什么。

所以我假设一个名为t的表,您可以根据需要进行调整......

然后我将如何处理这个,首先确定用户

select distinct userid
  from            t first_join
       inner join t cancel
               on first_join.tmstmp < cancel.tmstp
              and first_join.userid = cancel.userid
       inner join t.second_join
               on second_join.tmstmp > cancel.tmstp
              and second_join.userid = cancel.userid
 where first_join.actiontype = 'JOIN'
   and cancel.actiontype = 'CANCEL'
   and second_join.actiontype = 'JOIN'

所以现在你可以获得这些用户的所有记录

SELECT *
  FROM T
 WHERE USERID IN (
    select distinct userid
      from            t first_join
           inner join t cancel
                   on first_join.tmstmp < cancel.tmstp
                  and first_join.userid = cancel.userid
           inner join t.second_join
                   on second_join.tmstmp > cancel.tmstp
                  and second_join.userid = cancel.userid
     where first_join.actiontype = 'JOIN'
       and cancel.actiontype = 'CANCEL'
       and second_join.actiontype = 'JOIN'
     )