我无法优化以下SQL查询(使用postgresql 9.1):
WITH regions AS (
SELECT r1.region_id
FROM region r1,
(SELECT *
FROM region
WHERE region_id = 1) r2
WHERE (r1.region_country = r2.region_country
OR r2.region_country = 0)
AND (r1.region_province = r2.region_province
OR r2.region_province = 0)
AND (r1.region_area = r2.region_area
OR r2.region_area = 0))
SELECT id
FROM users
WHERE user_region in (SELECT region_id
FROM regions);
解释产生以下输出
Nested Loop (cost=85.02..42405.93 rows=13217 width=4) (actual time=0.447..970.132 rows=527444 loops=1)
Buffers: shared hit=464136
CTE regions
-> Nested Loop (cost=0.00..32.11 rows=5 width=4) (actual time=0.029..0.237 rows=135 loops=1)
Join Filter: (((r1.region_country = region.region_country) OR (region.region_country = 0)) AND ((r1.region_province = region.region_province) OR (region.region_province = 0)) AND ((r1.region_area = region.region_area) OR (region.region_area = 0)))
Buffers: shared hit=7
-> Index Scan using region_pkey on region (cost=0.00..8.27 rows=1 width=6) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (re_nr = 1)
Buffers: shared hit=3
-> Seq Scan on region r1 (cost=0.00..9.67 rows=567 width=10) (actual time=0.007..0.072 rows=567 loops=1)
Buffers: shared hit=4
-> HashAggregate (cost=0.11..0.16 rows=5 width=4) (actual time=0.326..0.449 rows=135 loops=1)
Buffers: shared hit=7
-> CTE Scan on regions (cost=0.00..0.10 rows=5 width=4) (actual time=0.032..0.278 rows=135 loops=1)
Buffers: shared hit=7
-> Bitmap Heap Scan on users (cost=52.79..8441.69 rows=2643 width=8) (actual time=1.442..6.459 rows=3907 loops=135)
Recheck Cond: (user_region = regions.region_id)
Buffers: shared hit=464129
-> Bitmap Index Scan on user_region (cost=0.00..52.13 rows=2643 width=0) (actual time=0.675..0.675 rows=3909 loops=135)
Index Cond: (user_region = regions.region_id)
Buffers: shared hit=1847
Total runtime: 1003.867 ms
如果我只是添加区域查询的输出,那么一切都和预期的一样快。
SELECT id
FROM users
WHERE user_region in (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110)
解释产生以下输出。
Bitmap Heap Scan on users (cost=5643.57..135774.21 rows=322812 width=4) (actual time=138.339..365.676 rows=527444 loops=1)
Recheck Cond: (user_region = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110}'::integer[]))
Buffers: shared hit=72973 read=1302
-> Bitmap Index Scan on user_region (cost=0.00..5562.86 rows=322812 width=0) (actual time=114.446..114.446 rows=527752 loops=1)
Index Cond: (user_region = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110}'::integer[]))
Buffers: shared hit=546 read=1301
Total runtime: 397.975 ms
单独计算区域查询也非常快。
Nested Loop (cost=0.00..32.11 rows=5 width=4) (actual time=0.059..12.323 rows=135 loops=1)
Join Filter: (((r1.region_country = region.region_country) OR (region.region_country = 0)) AND ((r1.region_province = region.region_province) OR (region.region_province = 0)) AND ((r1.region_area = region.region_area) OR (region.region_area = 0)))
Buffers: shared hit=1 read=6
-> Index Scan using region_pkey on region (cost=0.00..8.27 rows=1 width=6) (actual time=0.044..0.046 rows=1 loops=1)
Index Cond: (re_nr = 1)
Buffers: shared read=3
-> Seq Scan on region r1 (cost=0.00..9.67 rows=567 width=10) (actual time=0.005..12.122 rows=567 loops=1)
Buffers: shared hit=1 read=3
Total runtime: 12.379 ms
如果我向select from users
添加更多列,则两种不同方式之间的时差会变得更大。
有没有办法在一个快速查询中计算所有内容?
非常感谢任何帮助或对解决方案的指示。
[edit] 根据评论中的请求添加区域表的样本 用户可以选择区域(user_region),其可以是国家,省或城市/城市的一部分。 区域查询尝试查找该国家/地区,省或城市中的所有region_ids。 如果用户选择奥地利(region_id = 1),则应返回来自奥地利的所有其他region_ids。如果用户选择"下奥地利" (region_id = 26),应返回来自下奥地利省的所有地区(在样本数据27,28,29,30中)。
select * from region limit 30;
region_country | region_province | region_area | region_name | region_id
----------------+-----------------+-------------+---------------------+-----------
1 | 0 | 0 | Austria | 1
1 | 1 | 0 | Vienna | 2
1 | 1 | 1 | Vienna 1 | 3
1 | 1 | 2 | Vienna 2 | 4
1 | 1 | 3 | Vienna 3 | 5
1 | 1 | 4 | Vienna 4 | 6
1 | 1 | 5 | Vienna 5 | 7
1 | 1 | 6 | Vienna 6 | 8
1 | 1 | 7 | Vienna 7 | 9
1 | 1 | 8 | Vienna 8 | 10
1 | 1 | 9 | Vienna 9 | 11
1 | 1 | 10 | Vienna 10 | 12
1 | 1 | 11 | Vienna 11 | 13
1 | 1 | 12 | Vienna 12 | 14
1 | 1 | 13 | Vienna 13 | 15
1 | 1 | 14 | Vienna 14 | 16
1 | 1 | 15 | Vienna 15 | 17
1 | 1 | 16 | Vienna 16 | 18
1 | 1 | 17 | Vienna 17 | 19
1 | 1 | 18 | Vienna 18 | 20
1 | 1 | 19 | Vienna 19 | 21
1 | 1 | 20 | Vienna 20 | 22
1 | 1 | 21 | Vienna 21 | 23
1 | 1 | 22 | Vienna 22 | 24
1 | 1 | 23 | Vienna 23 | 25
1 | 2 | 0 | Lower Austria | 26
1 | 2 | 1 | St.Pölten | 27
1 | 2 | 2 | Amstetten | 28
1 | 2 | 3 | Baden | 29
1 | 2 | 4 | Bruck an der Leitha | 30
答案 0 :(得分:1)
join
通常比in
子句更有效:
.
.
.
SELECT id FROM users
INNER JOIN regions ON user_region = region_id;
假设每个用户只匹配一个区域(从您的查询中看似真实),这将为您提供相同的结果。
答案 1 :(得分:0)
你试过分析你的桌子吗?
根据您发布的说明,我可以看到,Postgres预计更少行的次数比实际返回的次数多39倍。
当Postgres的期望与实际结果集大不相同时,它可以选择次优计划,从而产生较差的查询计划并且花费更长时间来完成查询。