PostgreSQL在一个查询中获取事件发生的每日,每周和每月平均值

时间:2016-07-06 14:38:02

标签: sql postgresql query-optimization aggregate analytics

目前我有这个相当大的查询,可以按

运行
  1. 通过按事件名称和日期分组的事件的count(),将每日,每周,每月计数汇总到中间表中。
  2. 通过事件avg()分组来选择每个中间表的平均计数,结果的并集,并且因为我想为每日,每周,每月分别设置一个填充值0到空列。
  3. 然后我对所有列进行求和,0基本上作为无操作,这为每个事件提供了一个值。
  4. 但是查询非常大,我觉得我做了很多重复性的工作。有没有办法更好地进行此查询或将其缩小?我之前没有做过这样的查询,所以我不太确定。

    WITH monthly_counts as (
      SELECT
        event,
        count(*) as count
      FROM tracking_stuff
      WHERE
        event = 'thing'
        OR event = 'thing2'
        OR event = 'thing3'
      GROUP BY event, date_trunc('month', created_at)
    ),
    weekly_counts as (
      SELECT
        event,
        count(*) as count
      FROM tracking_stuff
      WHERE
        event = 'thing'
        OR event = 'thing2'
        OR event = 'thing3'
      GROUP BY event, date_trunc('week', created_at)
    ),
    daily_counts as (
      SELECT
        event,
        count(*) as count
      FROM tracking_stuff
      WHERE
        event = 'thing'
        OR event = 'thing2'
        OR event = 'thing3'
      GROUP BY event, date_trunc('day', created_at)
    ),
    query as (
      SELECT
        event,
        0 as daily_avg,
        0 as weekly_avg,
        avg(count) as monthly_avg
      FROM monthly_counts
      GROUP BY event
      UNION
      SELECT
        event,
        0 as daily_avg,
        avg(count) as weekly_avg,
        0 as monthly_avg
      FROM weekly_counts
      GROUP BY event
      UNION
      SELECT
        event,
        avg(count) as daily_avg,
        0 as weekly_avg,
        0 as monthly_avg
      FROM daily_counts
      GROUP BY event
    )
    SELECT
      event,
      sum(daily_avg) as daily_avg,
      sum(weekly_avg) as weekly_avg,
      sum(monthly_avg) as monthly_avg
    FROM query
    GROUP BY event;
    

2 个答案:

答案 0 :(得分:4)

9.5+使用grouping sets

  

FROM和WHERE子句选择的数据按每个指定的分组集分别分组,为每个组计算聚合,就像简单的GROUP BY子句一样,然后返回结果

select event,
    avg(total) filter (where day is not null) as avg_day,
    avg(total) filter (where week is not null) as avg_week,
    avg(total) filter (where month is not null) as avg_month    
from (
    select
        event,
        date_trunc('day', created_at) as day,
        date_trunc('week', created_at) as week,
        date_trunc('month', created_at) as month,
        count(*) as total
    from tracking_stuff
    where event in ('thing','thing2','thing3')
    group by grouping sets ((event, 2), (event, 3), (event, 4))
) s
group by event

答案 1 :(得分:2)

我会以这样的方式编写查询:

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
join (
    select event, avg(count) weekly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('week', created_at)
    ) s
    group by 1
) weekly using(event)
join (
    select event, avg(count) daily_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('day', created_at)
    ) s
    group by 1
) daily using(event)
order by 1;

如果where条件消除了大部分数据(例如超过一半),使用cte可能会略微加快查询执行速度:

with the_data as (
    select event, created_at
    from tracking_stuff
    where event in ('thing1', 'thing2', 'thing3')
    )

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from the_data
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
--  etc ... 

出于好奇,我已对数据进行了测试:

create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
    select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
    from generate_series(1, 1000000);

在每个查询中,我都将thing替换为thing1,因此查询消除了大约2/3的行。

10次测试的平均执行时间:

Original query          1106 ms
My query without cte    1077 ms
My query with cte        902 ms
Clodoaldo's query       5187 ms