Postgres:如何有效地按范围分组?

时间:2018-02-26 06:48:31

标签: sql postgresql performance group-by range

假设我的架构中所有区间项都处于活动状态:

item_active
- item_id    -- id, foreign_key to item.id
- date_from  -- timestamp
- date_to    -- timestamp

我想根据date1date2每天活跃的项目数进行分组。我可以通过加入日期子查询来实现:

with sq as (
    select generate_series(date1, date2, '1 day'::interval)::date dt
)
select sq.dt, count(distinct item_id)
from sq
join item_active 
     on item_active.date_from::date <= sq.dt
        and item_active.date_to::date >= sq.dt
group by sq.dt;

这很有效,但执行时间线性地取决于(date2 - date1)中的天数,O(N)。因此,我想要分组的日子越多,执行速度越慢。

GroupAggregate  (cost=213338137.82..216968937.32 rows=200 width=8) (actual time=7220.689..8938.530 rows=5 loops=1)
  Group Key: sq.dt
  CTE sq
    ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.011..0.029 rows=5 loops=1)
  ->  Sort  (cost=213338132.81..214548398.65 rows=484106333 width=8) (actual time=6745.165..7054.655 rows=4623322 loops=1)
    Sort Key: sq.dt
    Sort Method: external sort  Disk: 81352kB
    ->  Nested Loop  (cost=0.00..123648051.46 rows=484106333 width=8) (actual time=0.035..5994.225 rows=4623322 loops=1)
          Join Filter: (((item_active.date_from)::date <= sq.dt) AND ((item_active.date_to)::date >= sq.dt))
          Rows Removed by Join Filter: 17161463
          ->  CTE Scan on sq  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.014..0.039 rows=5 loops=1)
          ->  Materialize  (cost=0.00..122921.36 rows=4356957 width=20) (actual time=0.005..415.443 rows=4356957 loops=5)
                ->  Seq Scan on item_active  (cost=0.00..75606.57 rows=4356957 width=20) (actual time=0.011..382.122 rows=4356957 loops=1)
Planning time: 0.165 ms
Execution time: 8963.670 ms

也许有更有效的方法来获得相同的结果?

1 个答案:

答案 0 :(得分:0)

尝试在加入前缩小item_active:

with sq as (
    select generate_series(date1, date2, '1 day'::interval)::date dt
)
select sq.dt, count(distinct item_id)
from sq
join (select * from item_active 
        where item_active.date_from::date <= (select max (sq.dt) from sq)
        and item_active.date_to::date >= (select min (sq.dt) from sq))
     on item_active.date_from::date <= sq.dt
        and item_active.date_to::date >= sq.dt
group by sq.dt;

其次尝试禁用嵌套循环:

set local enable_nestloop to false;

它应该被哈希或合并连接替换 - 在某些情况下它可能更快。

如果这没有帮助,您应该考虑使用

将此查询具体化为视图
create materialized view