我找到了找到下一个事件日期的解决方案,但没有找到包含该事件所有数据的解决方案。作弊,我可以完成它,但这只适用于mysql并且在vertica中失败。
以下是我要解决的问题:
我想要显示所有事件a,其中包含来自第一个事件X的数据,该数据来自a并且不是a类型。所以这里是剪切和粘贴示例,因此您可以使用它来查看实际工作原理:
CREATE TABLE events (user_id int ,created_at int, event varchar(20));
INSERT INTO events values (0,0, 'a');
INSERT INTO events values (0,1, 'b');
INSERT INTO events values (0,2, 'c');
INSERT INTO events values (0,3, 'a');
INSERT INTO events values (0,4, 'c');
INSERT INTO events values (0,5, 'b');
INSERT INTO events values (0,6, 'a');
INSERT INTO events values (0,7, 'a');
INSERT INTO events values (0,8, 'd');
SELECT * FROM events;
+---------+------------+-------+
| user_id | created_at | event |
+---------+------------+-------+
| 0 | 0 | a |
| 0 | 1 | b |
| 0 | 2 | c |
| 0 | 3 | a |
| 0 | 4 | c |
| 0 | 5 | b |
| 0 | 6 | a |
| 0 | 7 | a |
| 0 | 8 | d |
+---------+------------+-------+
9 rows in set (0.00 sec)
这是我知道如何进入两者的结果,但我似乎无法在其中获取事件信息:
SELECT user_id, MAX(purchased) AS purchased, spent
FROM (
SELECT
e1.user_id AS user_id, e1.created_at AS purchased,
MIN(e2.created_at) AS spent
FROM events e1, events e2
WHERE
e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
e1.Event = 'a' AND e2.Event != 'a'
GROUP BY e1.user_id, e1.created_at
) e3 GROUP BY user_id, spent;
user_id | purchased | spent
---------+-----------+-------
0 | 0 | 1
0 | 3 | 4
0 | 7 | 8
现在,如果我也想要那里的事件类型,它不适用于上面的查询, 因为你要么必须在group-by(不是我们想要的)中使用事件字段,要么使用聚合(不是我们想要的)。在mysql中有趣,它可以工作,但我认为这是作弊,因为我必须使用vertica,这对我没有帮助:
SELECT user_id, MAX(purchased) as purchased, spent, event FROM (
SELECT
e1.User_ID AS user_id,
e1.created_at AS purchased,
MIN(e2.created_at) AS spent,
e2.event AS event
FROM events e1, events e2
WHERE
e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
e1.Event = 'a' AND e2.Event != 'a'
GROUP BY
e1.user_id,e1.created_at
) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
| 0 | 0 | 1 | b |
| 0 | 3 | 4 | c |
| 0 | 7 | 8 | d |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)
对于vertica,相同的查询会引发错误: 错误2640:列“e2.event”必须出现在GROUP BY子句中或用于聚合函数
一个优雅的解决方案是让两个事件与所有列配对并且不作弊,以便在vertica或其他不允许作弊的数据库中执行时,我可以得到与上面显示的相同的结果?在示例数据中,我只有一个我想要的列,即事件类型,但在现实世界中,它将是两列或三列。
请在回答之前使用发布的样本数据进行试用:)
答案 0 :(得分:0)
好的,虽然我不是100%肯定我明白你要做什么,看看这是否会起作用:
SELECT e3.user_id, MAX(e3.purchased) AS purchased, e3.spent, e.event
FROM (
SELECT
e1.user_id AS user_id, e1.created_at AS purchased,
MIN(e2.created_at) AS spent
FROM events e1, events e2
WHERE
e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
e1.Event = 'a' AND e2.Event != 'a'
GROUP BY e1.user_id, e1.created_at
) e3
JOIN events e on e3.user_id = e.user_id and e3.spent = e.created_at
GROUP BY e3.user_id, e3.spent, e.event
基本上我只是再次加入事件表,假设user_id
和created_at
是您的主键。
这是SQL Fiddle。
答案 1 :(得分:0)
试试这个......
With Cte As
(
Select Row_Number() Over (Partition By [user_id] Order By [created_at]) As row_num,
[user_id],
[created_at],
[event]
From [events]
)
Select c1.[user_id],
c1.[created_at] As purchased,
c2.[created_at] As spent,
c2.[event]
From Cte c1
Left Join Cte c2
On c1.row_num = c2.row_num - 1
Where c1.event = 'a'
And c2.event <> 'a'
答案 2 :(得分:0)
我通常使用相关子查询进行“下一步”计算,然后再连接回原始表。在这种情况下,我假设,唯一地定义一行。
以下是查询:
SELECT user_id, MAX(purchased) as purchased, spent, event
FROM (
SELECT e.User_ID, e.created_at AS purchased,
MIN(enext.created_at) AS spent,
min(enext.event) AS event
FROM (select e.*,
(select MIN(e2.created_at)
from event e2
where e2.user_id = e.user_id and e2.created_at > e.created_at and e2.event <> 'a'
) nextcreatedat
from events e
where e.event = 'a'
) e left outer join
events enext
on e.user_id = enext.user_id and
e.nextcreatedat = enext.create_at
GROUP BY e.user_id, e.created_at
) e3
GROUP BY user_id, spent;
汇总GROUP BY e.user_id, e.created_at
不是必需的,但我将其保留为与原始查询类似。
由于Vertica支持累积总和,因此有一种方法可以更有效地执行此操作,但它在MySQL中不起作用。