PostgreSQL查询的LEFT JOIN
部分运行速度很慢,我无法弄清楚原因。
完整查询:
SELECT t.id FROM tests t
LEFT JOIN tests c ON c.parent_id IN (t.id, t.parent_id)
INNER JOIN responses r ON (
r.test_id IN (t.id, t.parent_id, c.id)
) WHERE r.user_id = 333
tests.id
和tests.parent_id
上有索引。
测试包含28876行(其中有1282行WHERE parent_id IS NOT NULL
)。
查询的LEFT JOIN
部分生成32098行,约需700毫秒。
SELECT t.id FROM tests t
LEFT JOIN tests c ON c.parent_id IN (t.id, t.parent_id)
查询的其余部分花费的时间可以忽略不计。
为什么它可能会变慢,或者更好的方法来实现同样的事情?
谢谢!
SELECT VERSION()
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
EXPLAIN ANALYZE
(注意:这使用真实的表名usability_tests
,我在前面的示例中将其简化为tests
。)
Nested Loop (cost=5.18..158692.45 rows=80 width=4) (actual time=107.873..5718.295 rows=103 loops=1)
Join Filter: ((r.usability_test_id = t.id) OR (r.usability_test_id = t.parent_id) OR (r.usability_test_id = c.id))
-> Nested Loop Left Join (cost=0.56..136015.63 rows=28876 width=12) (actual time=0.091..486.496 rows=32098 loops=1)
Join Filter: ((c.parent_id = t.id) OR (c.parent_id = t.parent_id))
-> Seq Scan on usability_tests t (cost=0.00..1455.76 rows=28876 width=8) (actual time=0.042..39.558 rows=28876 loops=1)
-> Bitmap Heap Scan on usability_tests c (cost=0.56..4.60 rows=4 width=8) (actual time=0.010..0.011 rows=0 loops=28876)
Recheck Cond: ((parent_id = t.id) OR (parent_id = t.parent_id))
-> BitmapOr (cost=0.56..0.56 rows=4 width=0) (actual time=0.008..0.008 rows=0 loops=28876)
-> Bitmap Index Scan on index_usability_tests_on_parent_id (cost=0.00..0.28 rows=2 width=0) (actual time=0.003..0.003 rows=0 loops=28876)
Index Cond: (parent_id = t.id)
-> Bitmap Index Scan on index_usability_tests_on_parent_id (cost=0.00..0.28 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=28876)
Index Cond: (parent_id = t.parent_id)
-> Materialize (cost=4.62..153.63 rows=39 width=4) (actual time=0.001..0.076 rows=70 loops=32098)
-> Bitmap Heap Scan on responses r (cost=4.62..153.44 rows=39 width=4) (actual time=0.053..0.187 rows=70 loops=1)
Recheck Cond: (user_id = 3649)
-> Bitmap Index Scan on index_responses_on_user_id (cost=0.00..4.61 rows=39 width=0) (actual time=0.040..0.040 rows=70 loops=1)
Index Cond: (user_id = 3649)
Total runtime: 5718.592 ms
答案 0 :(得分:2)
更新:看起来您的查询基本上就是这样
with cte as (
select r.test_id
from responses as r
where r.user_id = 333
union all
select c.parent_id
from tests as c
inner join responses as r on r.test_id = c.id
where r.user_id = 333
)
select
t.id
from tests as t
where
t.id in (select c.test_id from cte as c) or
t.parent_id in (select c.test_id from cte as c)
旧:尝试将此转换为此查询并查看它是否会更快:
select t.id
from tests t
inner join tests c on c.parent_id = t.id
union all
select t.id
from tests t
inner join tests c oN c.parent_id = t.parent_id
执行其中一个查询需要多长时间?
答案 1 :(得分:1)
我认为查询可以简化为:
SELECT t.id FROM tests t
WHERE EXISTS (
SELECT * FROM responses r
WHERE (r.test_id = t.id OR r.test_id = t.parent_id )
AND r.user_id = 333
)
OR EXISTS (
SELECT * FROM responses r
JOIN tests c ON r.test_id = c.id
-- Note: the ... OR sibling makes no sense to me
WHERE (c.parent_id = t.id OR c.parent_id = t.parent_id)
AND r.user_id = 333
);
注意:问题中的查询可能为t.id
生成重复值。这个只报告不同的值。
更新:我刚测试了它(在合成数据上),上面的查询返回与原始减去重复项完全相同的结果。
UPDATE2:添加了兄弟姐妹比赛。