在我的数据库中,我有一个包含~3500条记录的表格,作为更复杂查询的一部分,我尝试使用“CASE”条件对自身执行内部连接,就像您在下面看到的那样。
SELECT *
FROM some_table AS t1
JOIN some_table AS t2 ON t1.type = t2.type
AND CASE
WHEN t1.type = 'ab' THEN t1.first = t2.first
WHEN t1.type = 'cd' THEN t1.second = t2.second
-- Column type contains only one of 2 possible varchar values
END;
问题是此查询执行的时间为3.2 - 4.5秒,而下一个请求的执行时间为40 - 50毫秒。
SELECT *
FROM some_table AS t1
JOIN some_table AS t2 ON t1.type = t2.type
AND (t1.first = t2.first OR t1.second = t2.second)
另外根据第一种情况下数据库处理的执行计划~5.8亿条记录,而表只包含~3500条。该表上有下一个索引:(id),(type),(type,first),(type,second)。
我们正在使用下一个版本: 在x86_64-unknown-linux-gnu上的PostgreSQL 9.4.5,由gcc(GCC)4.4.7 20120编译 313(Red Hat 4.4.7-16),64位
为什么PostgreSQL在这种情况下工作如此奇怪?
答案 0 :(得分:2)
测试一下:
select *
from
some_table as t1
join
some_table as t2 on
t1.type = t2.type
and
(
t1.type = 'ab' and t1.first = t2.first
or
t1.type = 'cd' and t1.second = t2.second
)
为了获得更好的性能,请根据函数创建索引:
create or replace function f (_type text, _first int, _second int)
returns integer as $$
select case _type when 'ab' then _first else _second end;
$$ language sql immutable;
create index i on some_table(f(type, first, second));
在查询中使用该索引:
select *
from
some_table as t1
join
some_table as t2 on
t1.type = t2.type
and
f(t1.type, t1.first, t1.second) = f(t1.type, t2.first, t2.second)