我有这个postgres查询
explain SELECT "facilities".* FROM "facilities" INNER JOIN
resource_indices ON resource_indices.resource_id = facilities.uuid WHERE
(client_id IS NULL OR (client_tag=NULL AND client_id=7))
AND (ARRAY['country:india']::varchar[] && resource_indices.tags)
AND "facilities"."is_listed" = 't'
ORDER BY resource_indices.name LIMIT 11 OFFSET 100;
观察偏移量。当偏移量小于200时,它使用索引并且工作正常。 该查询计划如下
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=23416.57..24704.45 rows=11 width=1457) (actual time=41.951..43.035 rows=11 loops=1)
-> Nested Loop (cost=0.71..213202.15 rows=1821 width=1457) (actual time=2.107..43.007 rows=211 loops=1)
-> Index Scan using index_resource_indices_on_name on resource_indices (cost=0.42..190226.95 rows=12460 width=28) (actual time=2.096..40.790 rows=408 loops=1)
Filter: ('{country:india}'::character varying[] && tags)
Rows Removed by Filter: 4495
-> Index Scan using index_facilities_on_uuid on facilities (cost=0.29..1.83 rows=1 width=1445) (actual time=0.005..0.005 rows=1 loops=408)
Index Cond: (uuid = resource_indices.resource_id)
Filter: ((client_id IS NULL) AND is_listed)
Planning time: 1.259 ms
Execution time: 43.121 ms
(10 rows)
使用散列连接增加四百个偏移量,并且性能更差。增加偏移会使性能更差。
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34508.62..34508.65 rows=11 width=1457) (actual time=136.288..136.291 rows=11 loops=1)
-> Sort (cost=34507.62..34512.18 rows=1821 width=1457) (actual time=136.224..136.268 rows=411 loops=1)
Sort Key: resource_indices.name
Sort Method: top-N heapsort Memory: 638kB
-> Hash Join (cost=29104.96..34419.46 rows=1821 width=1457) (actual time=23.885..95.099 rows=6518 loops=1)
Hash Cond: (facilities.uuid = resource_indices.resource_id)
-> Seq Scan on facilities (cost=0.00..4958.39 rows=33790 width=1445) (actual time=0.010..48.732 rows=33711 loops=1)
Filter: ((client_id IS NULL) AND is_listed)
Rows Removed by Filter: 848
-> Hash (cost=28949.21..28949.21 rows=12460 width=28) (actual time=23.311..23.311 rows=12601 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 814kB
-> Bitmap Heap Scan on resource_indices (cost=1048.56..28949.21 rows=12460 width=28) (actual time=9.369..18.710 rows=12601 loops=1)
Recheck Cond: ('{country:india}'::character varying[] && tags)
Heap Blocks: exact=7334
-> Bitmap Index Scan on index_resource_indices_on_tags (cost=0.00..1045.45 rows=12460 width=0) (actual time=7.680..7.680 rows=13889 loops=1)
Index Cond: ('{country:india}'::character varying[] && tags)
Planning time: 1.408 ms
Execution time: 136.465 ms
(18 rows)
我该如何解决这个问题?谢谢
答案 0 :(得分:1)
这是不可避免的,因为没有其他方法可以实现LIMIT 10 OFFSET 10000
但是要获取前10010行并丢弃除了最后10个之外的所有行。随着偏移量的增加,这必然会越来越糟糕。< / p>
PostgreSQL切换到不同的计划,因为它必须检索更多的结果行,并且快速检索前几行并且通常涉及嵌套循环连接的“快速启动”计划在执行更多结果行时将比其他计划表现更差需要的。
OFFSET
是邪恶的,你应该在大多数情况下避免使用它。阅读what Markus Winand has to say about this topic,特别是如何在没有OFFSET
的情况下对结果集进行分页。