Question

我的视图无法用作其他表中信息的“目录”。当使用unions和left join查询这样的视图时，我无法让PostgreSQL（11.4）使用索引。

首先，这是没有left join的简化版本：

create table t1 (
  id uuid,
  n int,
  updated timestamp
);
create index i1 on t1 (updated);

create table t2 (
  id uuid,
  n int,
  updated timestamp
);
create index i2 on t2 (updated);

create or replace view directory as (
    select * from t1
  union all
    select * from t2
);

在以下测试中，向每个表添加了10,000行。

在使用order by和limit查询此视图时，一切工作正常，并且使用了i1和i2索引：

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

计划：https://lpaste.com/KWcGXlAoEy

但是，如果在视图中使用了left join，则将不再使用i2索引：

create table aux2 (
  id uuid,
  x text
);
create index auxi2 on aux2 (id);

drop view directory;
create view directory as (
    select t1.id, t1.n, t1.updated, null from t1
  union all
    select t2.id, t2.n, t2.updated, a.x from t2
    left join aux2 a on a.id = t2.id              -- ✱
);

在这种情况下，将对t2进行顺序扫描：

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

计划：

Limit  (cost=916.89..917.19 rows=10 width=44) (actual time=6.128..6.135 rows=10 loops=1)
  Buffers: shared hit=141
  ->  Merge Append  (cost=916.89..1516.89 rows=20000 width=44) (actual time=6.127..6.132 rows=10 loops=1)
        Sort Key: t1.updated DESC
        Buffers: shared hit=141
        ->  Index Scan Backward using i1 on t1  (cost=0.29..375.29 rows=10000 width=60) (actual time=0.003..0.007 rows=10 loops=1)
              Buffers: shared hit=3
        ->  Sort  (cost=916.60..941.60 rows=10000 width=29) (actual time=6.122..6.122 rows=1 loops=1)
              Sort Key: t2.updated DESC
              Sort Method: top-N heapsort  Memory: 25kB
              Buffers: shared hit=138
              ->  Hash Left Join  (cost=289.00..600.50 rows=10000 width=29) (actual time=2.240..4.956 rows=10000 loops=1)
                    Hash Cond: (t2.id = a.id)
                    Buffers: shared hit=138
                    ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=28) (actual time=0.005..0.807 rows=10000 loops=1)
                          Buffers: shared hit=74
                    ->  Hash  (cost=164.00..164.00 rows=10000 width=17) (actual time=2.227..2.227 rows=10000 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 607kB
                          Buffers: shared hit=64
                          ->  Seq Scan on aux2 a  (cost=0.00..164.00 rows=10000 width=17) (actual time=0.004..1.092 rows=10000 loops=1)
                                Buffers: shared hit=64
Planning Time: 0.295 ms
Execution Time: 6.161 ms

不是唯一的left join妨碍PostgreSQL在这里使用i2。如果视图不包含union（即我们仅从select和left join开始执行t2 + aux2），则索引i2使用。

有什么更好的方法？

令我感到惊讶的是，计划者未使用i2索引，而且还没有将limit + order by传递给最终完成该操作的节点不使用索引时在t2上进行顺序扫描。（在实际系统上，返回了“ t2”中的所有行，只是最后却被limit扔掉了。）

Answer 1

很明显，如果涉及到UNION ALL，优化器还不够聪明，无法做到最好。

一种解决方案是以更复杂的方式编写查询，以帮助优化器找出最佳方法。您必须在没有视图的情况下这样做：

SELECT *
FROM ((SELECT t1.id, t1.n, t1.updated, null
       FROM t1
       ORDER BY t1.updated DESC
       LIMIT 10)
      UNION ALL
      (SELECT t2.id, t2.n, t2.updated, a.x
       FROM t2
          LEFT JOIN aux2 a ON a.id = t2.id
       ORDER BY t2.updated DESC
       LIMIT 10)
     ) AS q
ORDER BY updated DESC                                                 
LIMIT 10;

通过“ order by”，“ limit”和“ union”加快查询速度

1 个答案: