针对日期频率计数的优化Postgres查询

时间:2017-11-22 23:36:42

标签: sql postgresql

我有一张像

这样的表格
  CREATE TABLE events (
    id serial primary key,
    name character varying(255),
    created_at timestamp(6)
  )

在不同的日期有数百万行。

我想计算从Y日起X天的事件数。 Y日期不提前知道。

所以给出了像

这样的数据
id    name      created_at
1     event1    2017-01-02 12:00:00
2     event1    2017-01-03 12:00:00
3     event1    2017-01-03 12:00:00
4     event1    2017-01-04 12:00:00
5     event1    2017-01-04 12:00:00
6     event1    2017-01-04 12:00:00
7     event1    2017-01-05 12:00:00
8     event1    2017-01-05 12:00:00

我想在2017-01-01

日期获得此结果
d           count
1 days      1
2 days      2
3 days      3
4 days      2

我提出的最佳查询是

select date_trunc('day', age(timestamp '2018-01-01', created_at)) as "d", count(*)
FROM events
GROUP BY "d"
ORDER BY "d"

EXPLAIN ANALYZE

提供以下输出
GroupAggregate  (cost=134702.34..157202.34 rows=1000000 width=8) (actual time=2112.554..2457.594 rows=12 loops=1)
  Group Key: (date_trunc('day'::text, age('2018-01-01 00:00:00'::timestamp without time zone, created_at)))
  ->  Sort  (cost=134702.34..137202.34 rows=1000000 width=8) (actual time=2081.727..2277.930 rows=1000000 loops=1)
        Sort Key: (date_trunc('day'::text, age('2018-01-01 00:00:00'::timestamp without time zone, created_at)))
        Sort Method: external sort  Disk: 25424kB
        ->  Seq Scan on events  (cost=0.00..21370.00 rows=1000000 width=8) (actual time=0.021..836.849 rows=1000000 loops=1)
Planning time: 0.075 ms
Execution time: 2468.842 ms

这有效吗?

您如何优化此查询?

我可以使用哪些策略或Postgres功能?

更新

如果我在日期分组后执行日期转移,我几乎可以将执行时间减半:

SELECT AGE(TIMESTAMP '2018-01-01', "d"), "count"
FROM (SELECT date_trunc('day', created_at) AS "d", COUNT(*)
FROM events
GROUP BY "d"
ORDER BY "d") as "date_groups"

这是分析输出

Subquery Scan on date_groups  (cost=132202.34..164702.34 rows=1000000 width=16) (actual time=1329.102..1854.696 rows=12 loops=1)
  ->  GroupAggregate  (cost=132202.34..152202.34 rows=1000000 width=8) (actual time=1329.089..1854.635 rows=12 loops=1)
        Group Key: (date_trunc('day'::text, events.created_at))
        ->  Sort  (cost=132202.34..134702.34 rows=1000000 width=8) (actual time=1297.415..1680.793 rows=1000000 loops=1)
              Sort Key: (date_trunc('day'::text, events.created_at))
              Sort Method: external merge  Disk: 17512kB
              ->  Seq Scan on events  (cost=0.00..18870.00 rows=1000000 width=8) (actual time=0.022..606.144 rows=1000000 loops=1)
Planning time: 0.151 ms
Execution time: 1861.552 ms

在测试数据中,事件仅在几天内传播,因此怀疑性能增益在更广泛的日期范围内不会那么显着

0 个答案:

没有答案