通过“ order by”,“ limit”和“ union”加快查询速度

时间:2019-08-06 12:49:22

标签: postgresql indexing union

我的视图无法用作其他表中信息的“目录”。当使用unionsleft join查询这样的视图时,我无法让PostgreSQL(11.4)使用索引。

首先,这是没有left join的简化版本:

create table t1 (
  id uuid,
  n int,
  updated timestamp
);
create index i1 on t1 (updated);

create table t2 (
  id uuid,
  n int,
  updated timestamp
);
create index i2 on t2 (updated);

create or replace view directory as (
    select * from t1
  union all
    select * from t2
);

在以下测试中,向每个表添加了10,000行。

在使用order bylimit查询此视图时,一切工作正常,并且使用了i1i2索引:

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

计划:https://lpaste.com/KWcGXlAoEy

但是,如果在视图中使用了left join,则将不再使用i2索引:

create table aux2 (
  id uuid,
  x text
);
create index auxi2 on aux2 (id);

drop view directory;
create view directory as (
    select t1.id, t1.n, t1.updated, null from t1
  union all
    select t2.id, t2.n, t2.updated, a.x from t2
    left join aux2 a on a.id = t2.id              -- ✱
);

在这种情况下,将对t2进行顺序扫描:

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

计划:

Limit  (cost=916.89..917.19 rows=10 width=44) (actual time=6.128..6.135 rows=10 loops=1)
  Buffers: shared hit=141
  ->  Merge Append  (cost=916.89..1516.89 rows=20000 width=44) (actual time=6.127..6.132 rows=10 loops=1)
        Sort Key: t1.updated DESC
        Buffers: shared hit=141
        ->  Index Scan Backward using i1 on t1  (cost=0.29..375.29 rows=10000 width=60) (actual time=0.003..0.007 rows=10 loops=1)
              Buffers: shared hit=3
        ->  Sort  (cost=916.60..941.60 rows=10000 width=29) (actual time=6.122..6.122 rows=1 loops=1)
              Sort Key: t2.updated DESC
              Sort Method: top-N heapsort  Memory: 25kB
              Buffers: shared hit=138
              ->  Hash Left Join  (cost=289.00..600.50 rows=10000 width=29) (actual time=2.240..4.956 rows=10000 loops=1)
                    Hash Cond: (t2.id = a.id)
                    Buffers: shared hit=138
                    ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=28) (actual time=0.005..0.807 rows=10000 loops=1)
                          Buffers: shared hit=74
                    ->  Hash  (cost=164.00..164.00 rows=10000 width=17) (actual time=2.227..2.227 rows=10000 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 607kB
                          Buffers: shared hit=64
                          ->  Seq Scan on aux2 a  (cost=0.00..164.00 rows=10000 width=17) (actual time=0.004..1.092 rows=10000 loops=1)
                                Buffers: shared hit=64
Planning Time: 0.295 ms
Execution Time: 6.161 ms

不是唯一的left join妨碍PostgreSQL在这里使用i2。如果视图不包含union(即我们仅从selectleft join开始执行t2 + aux2),则索引i2使用。

有什么更好的方法?

令我感到惊讶的是,计划者未使用i2索引,而且还没有将limit + order by传递给最终完成该操作的节点不使用索引时在t2上进行顺序扫描。 (在实际系统上,返回了“ t2”中的所有行,只是最后却被limit扔掉了。)

1 个答案:

答案 0 :(得分:2)

很明显,如果涉及到UNION ALL,优化器还不够聪明,无法做到最好。

一种解决方案是以更复杂的方式编写查询,以帮助优化器找出最佳方法。您必须在没有视图的情况下这样做:

SELECT *
FROM ((SELECT t1.id, t1.n, t1.updated, null
       FROM t1
       ORDER BY t1.updated DESC
       LIMIT 10)
      UNION ALL
      (SELECT t2.id, t2.n, t2.updated, a.x
       FROM t2
          LEFT JOIN aux2 a ON a.id = t2.id
       ORDER BY t2.updated DESC
       LIMIT 10)
     ) AS q
ORDER BY updated DESC                                                 
LIMIT 10;