在SQL中有效地查找下一个事件 - 包含所有列

时间:2013-01-10 20:17:47

标签: mysql sql vertica

我找到了找到下一个事件日期的解决方案,但没有找到包含该事件所有数据的解决方案。作弊,我可以完成它,但这只适用于mysql并且在vertica中失败。

以下是我要解决的问题:

我想要显示所有事件a,其中包含来自第一个事件X的数据,该数据来自a并且不是a类型。所以这里是剪切和粘贴示例,因此您可以使用它来查看实际工作原理:

CREATE TABLE events (user_id int ,created_at int, event varchar(20));
INSERT INTO events values (0,0, 'a');
INSERT INTO events values (0,1, 'b');
INSERT INTO events values (0,2, 'c');
INSERT INTO events values (0,3, 'a');
INSERT INTO events values (0,4, 'c');
INSERT INTO events values (0,5, 'b');
INSERT INTO events values (0,6, 'a');
INSERT INTO events values (0,7, 'a');
INSERT INTO events values (0,8, 'd');

SELECT * FROM events;
+---------+------------+-------+
| user_id | created_at | event |
+---------+------------+-------+
|       0 |          0 | a     |
|       0 |          1 | b     |
|       0 |          2 | c     |
|       0 |          3 | a     |
|       0 |          4 | c     |
|       0 |          5 | b     |
|       0 |          6 | a     |
|       0 |          7 | a     |
|       0 |          8 | d     |
+---------+------------+-------+
9 rows in set (0.00 sec)

这是我知道如何进入两者的结果,但我似乎无法在其中获取事件信息:

SELECT user_id, MAX(purchased) AS purchased, spent 
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 GROUP BY user_id, spent;

 user_id | purchased | spent 
---------+-----------+-------
       0 |         0 |     1
       0 |         3 |     4
       0 |         7 |     8

现在,如果我也想要那里的事件类型,它不适用于上面的查询, 因为你要么必须在group-by(不是我们想要的)中使用事件字段,要么使用聚合(不是我们想要的)。在mysql中有趣,它可以工作,但我认为这是作弊,因为我必须使用vertica,这对我没有帮助:

SELECT user_id, MAX(purchased) as purchased, spent, event FROM (
    SELECT 
        e1.User_ID AS user_id, 
        e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent, 
        e2.event AS event 
    FROM events e1, events e2 
    WHERE 
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND 
        e1.Event = 'a' AND e2.Event != 'a' 
    GROUP BY
        e1.user_id,e1.created_at
 ) e3 GROUP BY user_id, spent;


+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

对于vertica,相同的查询会引发错误:     错误2640:列“e2.event”必须出现在GROUP BY子句中或用于聚合函数

一个优雅的解决方案是让两个事件与所有列配对并且不作弊,以便在vertica或其他不允许作弊的数据库中执行时,我可以得到与上面显示的相同的结果?在示例数据中,我只有一个我想要的列,即事件类型,但在现实世界中,它将是两列或三列。

请在回答之前使用发布的样本数据进行试用:)

3 个答案:

答案 0 :(得分:0)

好的,虽然我不是100%肯定我明白你要做什么,看看这是否会起作用:

SELECT e3.user_id, MAX(e3.purchased) AS purchased, e3.spent, e.event
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 
 JOIN events e on e3.user_id = e.user_id and e3.spent = e.created_at
GROUP BY e3.user_id, e3.spent, e.event

基本上我只是再次加入事件表,假设user_idcreated_at是您的主键。

这是SQL Fiddle

答案 1 :(得分:0)

试试这个......

With    Cte As
(
        Select  Row_Number() Over (Partition By [user_id] Order By [created_at]) As row_num,
                [user_id],
                [created_at],
                [event]
        From    [events]
)
Select  c1.[user_id],
        c1.[created_at] As purchased,
        c2.[created_at] As spent,
        c2.[event]
From    Cte c1
Left    Join Cte c2
        On  c1.row_num = c2.row_num - 1
Where   c1.event = 'a'
And     c2.event <> 'a'

答案 2 :(得分:0)

我通常使用相关子查询进行“下一步”计算,然后再连接回原始表。在这种情况下,我假设,唯一地定义一行。

以下是查询:

SELECT user_id, MAX(purchased) as purchased, spent, event
FROM (
    SELECT e.User_ID, e.created_at AS purchased, 
           MIN(enext.created_at) AS spent,
           min(enext.event) AS event 
    FROM (select e.*,
                 (select MIN(e2.created_at)
                  from event e2
                  where e2.user_id = e.user_id and e2.created_at > e.created_at and e2.event <> 'a'
                 ) nextcreatedat
          from events e
          where e.event = 'a'
         ) e left outer join
         events enext
         on e.user_id = enext.user_id and
            e.nextcreatedat = enext.create_at
    GROUP BY e.user_id, e.created_at
    ) e3
 GROUP BY user_id, spent;

汇总GROUP BY e.user_id, e.created_at不是必需的,但我将其保留为与原始查询类似。

由于Vertica支持累积总和,因此有一种方法可以更有效地执行此操作,但它在MySQL中不起作用。