使用Postgres 8.4。我有一个用户活动表,看起来像这样:
userid | timestamp | action
---------------------------------------------
0001 | 11/11/2015 9:00:02 | X
0001 | 11/11/2015 9:00:22 | Y
0002 | 11/11/2015 9:01:02 | Z
0002 | 11/11/2015 9:03:02 | W
0003 | 11/11/2015 9:04:02 | X
0004 | 11/11/2015 9:05:02 | Y
我需要做的是查找执行一系列操作 X 然后 Y 的用户数量 或 X ,然后 Y ,然后 Z ,并计算下一步用户的数量。
所以我输入了一组有序的动作,我想要计算有多少用户通过这些动作(第1步:动作1,第2步:动作2,第3步)
我正在尝试获得像
这样的结果step | action | count
=======================
1 | X | 100 <---- 100 users did X
2 | Y | 55 <-----55 did X and then Y (45 dropped away)
3 | Z | 12 <-----12 did X and then Y and then Z (43 more dropped)
正如你所看到的那样,计数总是在减少:100个用户做了X,那些做X 55的用户做了Y而12个做了Z.
我怎样才能做到这一点?
答案 0 :(得分:1)
Here is one rather brute force approach. Use listagg()
to create the sequences, and then look for them:
select p.pattern, count(t.actions)
from (select 'X' as pattern union all select 'XY' union all SELECT 'XYZ'
) p left join
(select userid, listagg(action, '') within group (order by timestamp) actions
from table t
group by userid
) t
on t.actions like concat('%', p.pattern, '%')
group by p.pattern;
答案 1 :(得分:1)
The simplest solution will be probably to use LEFT JOIN
to join the table with itself:
WITH actions(action) AS(
VALUES ('X'),('Y'),('Z'))
SELECT d.action
,Count(DISTINCT a.userid)
FROM table1 as a
LEFT JOIN table1 AS b
ON a.userid = b.userid AND b.action = 'Y' AND a.timestamp < b.timestamp
LEFT JOIN table1 AS c
ON a.userid = c.userid AND c.action = 'Z' AND b.timestamp < c.timestamp
JOIN actions AS d
ON d.action IN (a.action, b.action, c.action)
WHERE a.action = 'X'
GROUP BY d.action
答案 2 :(得分:0)
我确信必须有更好的方法来使用其他一些SQL功能。
但是一个简单的例子就是我在下面粘贴的查询。
那个人会得到X-Y-Z的用户,通过修改这个查询就可以轻松完成X-Y。
select count(distinct(userid)) from user_activity u1
where action = 'Z'
and exists
(select userid from user_activity u2
where u2.userid = u1.userid
and u2.date < u1.date
and u2.action = 'Y'
and exists (
select userid from user_activity u3
where u3.userid = u2.userid
and u3.date < u2.date
and u3.action = 'X'
)
)