SQL查询时间序列数据,用于查找用户活动的趋势计数

时间:2015-09-01 20:05:11

标签: sql amazon-redshift

使用Postgres 8.4。我有一个用户活动表,看起来像这样:

userid    |  timestamp           |  action
---------------------------------------------
0001      |  11/11/2015 9:00:02  |  X
0001      |  11/11/2015 9:00:22  |  Y
0002      |  11/11/2015 9:01:02  |  Z
0002      |  11/11/2015 9:03:02  |  W 
0003      |  11/11/2015 9:04:02  |  X
0004      |  11/11/2015 9:05:02  |  Y

我需要做的是查找执行一系列操作 X 然后 Y 的用户数量 或 X ,然后 Y ,然后 Z ,并计算下一步用户的数量。

所以我输入了一组有序的动作,我想要计算有多少用户通过这些动作(第1步:动作1,第2步:动作2,第3步)

我正在尝试获得像

这样的结果
step | action |  count
=======================
 1    |  X     | 100       <---- 100 users did X
 2    |  Y     |  55       <-----55 did X and then Y (45 dropped away)
 3    |  Z     |  12       <-----12 did X and then Y and then Z (43 more dropped)

正如你所看到的那样,计数总是在减少:100个用户做了X,那些做X 55的用户做了Y而12个做了Z.

我怎样才能做到这一点?

3 个答案:

答案 0 :(得分:1)

Here is one rather brute force approach. Use listagg() to create the sequences, and then look for them:

select p.pattern, count(t.actions)
from (select 'X' as pattern union all select 'XY' union all SELECT 'XYZ'
     ) p left join
     (select userid, listagg(action, '') within group (order by timestamp) actions
      from table t
      group by userid
     ) t 
     on t.actions like concat('%', p.pattern, '%')
group by p.pattern;

答案 1 :(得分:1)

The simplest solution will be probably to use LEFT JOIN to join the table with itself:

WITH actions(action) AS(
  VALUES ('X'),('Y'),('Z'))
SELECT d.action
       ,Count(DISTINCT a.userid)
FROM table1 as a
  LEFT JOIN table1 AS b
    ON a.userid = b.userid AND b.action = 'Y' AND a.timestamp < b.timestamp
  LEFT JOIN table1 AS c
    ON a.userid = c.userid AND c.action = 'Z' AND b.timestamp < c.timestamp
  JOIN actions AS d
    ON d.action IN (a.action, b.action, c.action)
WHERE a.action = 'X'
GROUP BY d.action

fiddle

答案 2 :(得分:0)

我确信必须有更好的方法来使用其他一些SQL功能。

但是一个简单的例子就是我在下面粘贴的查询。

那个人会得到X-Y-Z的用户,通过修改这个查询就可以轻松完成X-Y。

select count(distinct(userid)) from user_activity u1
where action = 'Z' 
and exists
  (select userid from user_activity u2
   where u2.userid = u1.userid
   and u2.date < u1.date
   and u2.action = 'Y'
   and exists (
     select userid from user_activity u3
     where u3.userid = u2.userid
     and u3.date < u2.date
     and u3.action = 'X'
     )
   )