PostgreSQL数据分组

时间:2015-04-09 11:12:16

标签: sql postgresql amazon-redshift

我有一个postgresql表,其中包含按日期/时间记录的事件。该表格包含ideventtimestamp列。

我的输出必须是这样的:

'Day', '1st Timers', '2nd Timers', '3rd Timers', '3+ Timers'

第一次定时器是第一次完成此事件的所有ID。 第二次定时器是第二次完成此事件的所有ID。等等。

这可以使用单个SQL查询吗?

编辑:根据请求提供样本数据和输出

user_id date                event
1       09/03/15 14:08      opened
2      10/03/15 14:08       opened
1      11/03/15 14:08       opened
4      14/03/15 14:08       opened
1      15/03/15 14:08       opened
5      16/03/15 14:08       opened
1      17/03/15 14:08       opened
4      17/03/15 14:08       opened
6      18/03/15 14:08       opened
1      18/03/15 14:08       opened
6      18/03/15 14:08       other


Output (for event=opened)
date        1time   2times  3times  4times  5times
09/03/15    1       0       0       0       0
10/03/15    1       0       0       0       0
11/03/15    0       1       0       0       0
14/03/15    1       0       0       0       0
15/03/15    0       0       1       0       0
16/03/15    1       0       0       0       0
17/03/15    0       1       0       1       0
18/03/15    1       0       0       0       1

2 个答案:

答案 0 :(得分:4)

对于每个日期,您似乎想要计算达到1次,2次等的用户数量。我将其视为row_number(),然后是条件聚合:

select thedate,
       sum(case when seqnum = 1 then 1 else 0 end) as time_1,
       sum(case when seqnum = 2 then 1 else 0 end) as time_2,
       sum(case when seqnum = 3 then 1 else 0 end) as time_3,
       sum(case when seqnum = 4 then 1 else 0 end) as time_4,
       sum(case when seqnum = 5 then 1 else 0 end) as time_5
from (select t.*, date_trunc('day', date) as thedate
             row_number() over (partition by user_id order by date_trunc('day', date)) as seqnum
      from table t
      where event = 'opened'
     ) t
group by thedate
order by thedate;

答案 1 :(得分:2)

汇总FILTER

从Postgres 9.4开始,使用新的聚合FILTER子句:

SELECT event_time::date
     , count(*) FILTER (WHERE rn = 1) AS times_1
     , count(*) FILTER (WHERE rn = 2) AS times_2
     , count(*) FILTER (WHERE rn = 3) AS times_3
    -- etc.
from (
   SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn
   FROM   tbl
   ) t
GROUP  BY 1
ORDER  BY 1;

相关:

关于演员event_time::date

交叉表

或使用实际的交叉表查询(更快)。适用于任何现代Postgres版本。 首先阅读:

SELECT * FROM crosstab(
       'SELECT event_time::date, rn, count(*)::int AS ct
        FROM  (
           SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn
           FROM   tbl
           ) t
        GROUP  BY 1, 2
        ORDER  BY 1'

      ,$$SELECT * FROM unnest ('{1,2,3}'::int[])$$
   ) AS ct (day date, times_1 int, times_2 int, times_3 int);