我有一张像
这样的表格 CREATE TABLE events (
id serial primary key,
name character varying(255),
created_at timestamp(6)
)
在不同的日期有数百万行。
我想计算从Y日起X天的事件数。 Y日期不提前知道。
所以给出了像
这样的数据id name created_at
1 event1 2017-01-02 12:00:00
2 event1 2017-01-03 12:00:00
3 event1 2017-01-03 12:00:00
4 event1 2017-01-04 12:00:00
5 event1 2017-01-04 12:00:00
6 event1 2017-01-04 12:00:00
7 event1 2017-01-05 12:00:00
8 event1 2017-01-05 12:00:00
我想在2017-01-01
日期获得此结果d count
1 days 1
2 days 2
3 days 3
4 days 2
我提出的最佳查询是
select date_trunc('day', age(timestamp '2018-01-01', created_at)) as "d", count(*)
FROM events
GROUP BY "d"
ORDER BY "d"
从EXPLAIN ANALYZE
GroupAggregate (cost=134702.34..157202.34 rows=1000000 width=8) (actual time=2112.554..2457.594 rows=12 loops=1)
Group Key: (date_trunc('day'::text, age('2018-01-01 00:00:00'::timestamp without time zone, created_at)))
-> Sort (cost=134702.34..137202.34 rows=1000000 width=8) (actual time=2081.727..2277.930 rows=1000000 loops=1)
Sort Key: (date_trunc('day'::text, age('2018-01-01 00:00:00'::timestamp without time zone, created_at)))
Sort Method: external sort Disk: 25424kB
-> Seq Scan on events (cost=0.00..21370.00 rows=1000000 width=8) (actual time=0.021..836.849 rows=1000000 loops=1)
Planning time: 0.075 ms
Execution time: 2468.842 ms
这有效吗?
您如何优化此查询?
我可以使用哪些策略或Postgres功能?
更新
如果我在日期分组后执行日期转移,我几乎可以将执行时间减半:
SELECT AGE(TIMESTAMP '2018-01-01', "d"), "count"
FROM (SELECT date_trunc('day', created_at) AS "d", COUNT(*)
FROM events
GROUP BY "d"
ORDER BY "d") as "date_groups"
这是分析输出
Subquery Scan on date_groups (cost=132202.34..164702.34 rows=1000000 width=16) (actual time=1329.102..1854.696 rows=12 loops=1)
-> GroupAggregate (cost=132202.34..152202.34 rows=1000000 width=8) (actual time=1329.089..1854.635 rows=12 loops=1)
Group Key: (date_trunc('day'::text, events.created_at))
-> Sort (cost=132202.34..134702.34 rows=1000000 width=8) (actual time=1297.415..1680.793 rows=1000000 loops=1)
Sort Key: (date_trunc('day'::text, events.created_at))
Sort Method: external merge Disk: 17512kB
-> Seq Scan on events (cost=0.00..18870.00 rows=1000000 width=8) (actual time=0.022..606.144 rows=1000000 loops=1)
Planning time: 0.151 ms
Execution time: 1861.552 ms
在测试数据中,事件仅在几天内传播,因此怀疑性能增益在更广泛的日期范围内不会那么显着