在具有多行的多表连接中添加某些条件会导致查询速度降低数量级。我已经尝试了很多方法来加快速度,包括每种类型的表连接,重新排序连接,重新排序WHERE子句,执行子查询,使用WHERE子句中的CASE语句等等。
SQL细节如下。
WITH
)注意:我正在尝试为API编写通用SQL构建器,允许调用者在图中的任何位置指定任意条件。问题在于其中一些调用非常快,而其他调用并不是由于Postgres计划执行的方式。专门为此查询精心设计的优化不会帮助我满足通用SQL构建器的更大目标。
我在Postgres中有一个存储顶点和边(一个简单的图形数据库)的模式:
CREATE TABLE IF NOT EXISTS vertex (type text, id serial, name text, data jsonb, UNIQUE (id))
CREATE INDEX vertex_data_idx ON vertex USING gin (data jsonb_path_ops)
CREATE INDEX vertex_type_idx ON vertex (type)
CREATE INDEX vertex_name_idx ON vertex (name)
CREATE TABLE IF NOT EXISTS edge (src integer REFERENCES vertex (id), dst integer REFERENCES vertex (id))
CREATE INDEX edge_src_idx ON edge (src)
CREATE INDEX edge_dst_idx ON edge (dst)
架构存储图形,其中一个是这样的:PLANET - > CONTINENT - >国家 - > REGION
我的示例数据库中有447554个总顶点和3155047个总边,但是相关数据在这里:
此查询查找在任何给定区域中具有西班牙语发言者的行星很快:
SELECT DISTINCT
v1.name as name, v1.id as id
FROM vertex v1
LEFT JOIN edge e1 ON (v1.id = e1.src)
LEFT JOIN vertex v2 ON (v2.id = e1.dst)
LEFT JOIN edge e2 ON (v2.id = e2.src)
LEFT JOIN vertex v3 ON (v3.id = e2.dst)
LEFT JOIN edge e3 ON (v3.id = e3.src)
LEFT JOIN vertex v4 ON (v4.id = e3.dst)
WHERE
v4.type = 'REGION' AND
v4.data @> '{"languages":["spanish"]}'::jsonb
规划时间:6.289毫秒 执行时间:0.744毫秒
当我在图表(v1)的第一个表中的索引列上添加一个对结果没有影响的条件时,查询慢12,657次:
SELECT DISTINCT
v1.name as name, v1.id as id
FROM vertex v1
LEFT JOIN edge e1 ON (v1.id = e1.src)
LEFT JOIN vertex v2 ON (v2.id = e1.dst)
LEFT JOIN edge e2 ON (v2.id = e2.src)
LEFT JOIN vertex v3 ON (v3.id = e2.dst)
LEFT JOIN edge e3 ON (v3.id = e3.src)
LEFT JOIN vertex v4 ON (v4.id = e3.dst)
WHERE
v1.type = 'PLANET' AND
v4.type = 'REGION' AND
v4.data @> '{"languages":["spanish"]}'::jsonb
规划时间:7.664毫秒 执行时间:89010.096 ms
这是第一个快速通话中的EXPLAIN(ANALYZE,BUFFERS):
Unique (cost=154592.03..155453.96 rows=114925 width=28) (actual time=0.585..0.616 rows=4 loops=1)
Buffers: shared hit=92
-> Sort (cost=154592.03..154879.34 rows=114925 width=28) (actual time=0.579..0.588 rows=4 loops=1)
Sort Key: v1.name, v1.id
Sort Method: quicksort Memory: 17kB
Buffers: shared hit=92
-> Nested Loop (cost=37.96..142377.39 rows=114925 width=28) (actual time=0.155..0.549 rows=4 loops=1)
Buffers: shared hit=92
-> Nested Loop (cost=37.53..80131.76 rows=114925 width=4) (actual time=0.141..0.468 rows=4 loops=1)
Join Filter: (v2.id = e1.dst)
Buffers: shared hit=76
-> Nested Loop (cost=37.10..49179.08 rows=14270 width=8) (actual time=0.126..0.386 rows=4 loops=1)
Buffers: shared hit=60
-> Nested Loop (cost=36.68..41450.17 rows=14270 width=4) (actual time=0.112..0.304 rows=4 loops=1)
Join Filter: (v3.id = e2.dst)
Buffers: shared hit=44
-> Nested Loop (cost=36.25..37606.57 rows=1772 width=8) (actual time=0.092..0.209 rows=4 loops=1)
Buffers: shared hit=28
-> Nested Loop (cost=35.83..36646.82 rows=1772 width=4) (actual time=0.074..0.116 rows=4 loops=1)
Buffers: shared hit=12
-> Bitmap Heap Scan on vertex v4 (cost=30.99..1514.00 rows=220 width=4) (actual time=0.039..0.042 rows=1 loops=1)
Recheck Cond: (data @> '{"languages":["spanish"]}'::jsonb)
Filter: (type = 'REGION'::text)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on vertex_data_idx (cost=0.00..30.94 rows=392 width=0) (actual time=0.020..0.020 rows=1 loops=1)
Index Cond: (data @> '{"languages":["spanish"]}'::jsonb)
Buffers: shared hit=4
-> Bitmap Heap Scan on edge e3 (cost=4.84..159.12 rows=57 width=8) (actual time=0.023..0.037 rows=4 loops=1)
Recheck Cond: (dst = v4.id)
Heap Blocks: exact=4
Buffers: shared hit=7
-> Bitmap Index Scan on edge_dst_idx (cost=0.00..4.82 rows=57 width=0) (actual time=0.013..0.013 rows=4 loops=1)
Index Cond: (dst = v4.id)
Buffers: shared hit=3
-> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.008..0.011 rows=1 loops=4)
Index Cond: (id = e3.src)
Heap Fetches: 4
Buffers: shared hit=16
-> Index Scan using edge_dst_idx on edge e2 (cost=0.43..1.46 rows=57 width=8) (actual time=0.008..0.011 rows=1 loops=4)
Index Cond: (dst = e3.src)
Buffers: shared hit=16
-> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.006..0.009 rows=1 loops=4)
Index Cond: (id = e2.src)
Heap Fetches: 4
Buffers: shared hit=16
-> Index Scan using edge_dst_idx on edge e1 (cost=0.43..1.46 rows=57 width=8) (actual time=0.005..0.008 rows=1 loops=4)
Index Cond: (dst = e2.src)
Buffers: shared hit=16
-> Index Scan using vertex_id_key on vertex v1 (cost=0.42..0.53 rows=1 width=28) (actual time=0.006..0.009 rows=1 loops=4)
Index Cond: (id = e1.src)
Buffers: shared hit=16
Planning time: 6.940 ms
Execution time: 0.714 ms
第二,慢速通话:
HashAggregate (cost=592.23..592.24 rows=1 width=28) (actual time=89009.873..89009.885 rows=4 loops=1)
Group Key: v1.name, v1.id
Buffers: shared hit=11644657 read=1240045
-> Nested Loop (cost=2.98..592.22 rows=1 width=28) (actual time=9098.961..89009.833 rows=4 loops=1)
Buffers: shared hit=11644657 read=1240045
-> Nested Loop (cost=2.56..306.89 rows=522 width=32) (actual time=0.424..30066.007 rows=3092522 loops=1)
Buffers: shared hit=454795 read=46267
-> Nested Loop (cost=2.13..86.31 rows=65 width=36) (actual time=0.306..2120.293 rows=62500 loops=1)
Buffers: shared hit=239162 read=12162
-> Nested Loop (cost=1.70..51.10 rows=65 width=32) (actual time=0.261..574.490 rows=62500 loops=1)
Buffers: shared hit=488 read=562
actual time=0.205..1.206 rows=25 loops=1)p (cost=1.27..23.95 rows=8 width=36) (--More--
Buffers: shared hit=109 read=17
-> Nested Loop (cost=0.85..19.62 rows=8 width=32) (actual time=0.173..0.547 rows=25 loops=1)
Buffers: shared hit=12 read=14
-> Index Scan using vertex_type_idx on vertex v1 (cost=0.42..8.44 rows=1 width=28) (actual time=0.123..0.153 rows=5 loops=1)
Index Cond: (type = 'PLANET'::text)
Buffers: shared hit=2 read=4
-> Index Scan using edge_src_idx on edge e1 (cost=0.43..10.18 rows=100 width=8) (actual time=0.021..0.039 rows=5 loops=5)
Index Cond: (src = v1.id)
Buffers: shared hit=10 read=10
-> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.009..0.013 rows=1 loops=25)
Index Cond: (id = e1.dst)
Heap Fetches: 25
Buffers: shared hit=97 read=3
43..2.39 rows=100 width=8) (actual time=0.031..8.504 rows=2500 loops=25)(cost=0.--More--
Index Cond: (src = v2.id)
Buffers: shared hit=379 read=545
-> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.010..0.013 rows=1 loops=62500)
Index Cond: (id = e2.dst)
Heap Fetches: 62500
Buffers: shared hit=238674 read=11600
-> Index Scan using edge_src_idx on edge e3 (cost=0.43..2.39 rows=100 width=8) (actual time=0.013..0.163 rows=49 loops=62500)
Index Cond: (src = v3.id)
Buffers: shared hit=215633 read=34105
-> Index Scan using vertex_id_key on vertex v4 (cost=0.42..0.54 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=3092522)
Index Cond: (id = e3.dst)
Filter: ((data @> '{"languages":["spanish"]}'::jsonb) AND (type = 'REGION'::text))
Rows Removed by Filter: 1
Buffers: shared hit=11189862 read=1193778
Planning time: 7.664 ms
Execution time: 89010.096 ms
答案 0 :(得分:1)
[张贴为答案,因为我需要格式化]
边缘表绝对需要一个主键(这意味着{src,dst}的NOT NULL很好):
CREATE TABLE IF NOT EXISTS edge
( src integer NOT NULL REFERENCES vertex (id)
, dst integer NOT NULL REFERENCES vertex (id)
, PRIMARY KEY (src,dst)
);
CREATE UNIQUE INDEX edge_dst_src_idx ON edge (dst, src);
-- the estimates in the question seem to be off, statistics may be absent.
VACUUM ANALYZE edge; -- refresh the statistics
VACUUM ANALYZE vertex;
我也将{type,name}索引组合在一起(类型似乎具有非常低的基数)。也许甚至使它独特而不是空,但我不知道你的数据。
CREATE INDEX vertex_type_name_idx ON vertex (type, name);
答案 1 :(得分:0)
我认为使用子查询会使postgresql无法使用索引。因此,请尝试通过以下查询来测试性能改进,方法是不使用索引:
select * from (
SELECT DISTINCT
v1.name as name, v1.id as id, v1.type as v1_type
FROM vertex v1
LEFT JOIN edge e1 ON (v1.id = e1.src)
LEFT JOIN vertex v2 ON (v2.id = e1.dst)
LEFT JOIN edge e2 ON (v2.id = e2.src)
LEFT JOIN vertex v3 ON (v3.id = e2.dst)
LEFT JOIN edge e3 ON (v3.id = e3.src)
LEFT JOIN vertex v4 ON (v4.id = e3.dst)
WHERE
v4.type = 'REGION' AND
v4.data @> '{"languages":["spanish"]}'::jsonb
) t1
where v1_type = 'PLANET'