Question

在我的数据库中，我有一个包含~3500条记录的表格，作为更复杂查询的一部分，我尝试使用“CASE”条件对自身执行内部连接，就像您在下面看到的那样。

SELECT *
FROM some_table AS t1
JOIN some_table AS t2 ON t1.type = t2.type
    AND CASE
       WHEN t1.type = 'ab' THEN t1.first = t2.first
       WHEN t1.type = 'cd' THEN t1.second = t2.second
       -- Column type contains only one of 2 possible varchar values
    END;

问题是此查询执行的时间为3.2 - 4.5秒，而下一个请求的执行时间为40 - 50毫秒。

SELECT *
FROM some_table AS t1
JOIN some_table AS t2 ON t1.type = t2.type
    AND (t1.first = t2.first OR t1.second = t2.second)

另外根据第一种情况下数据库处理的执行计划~5.8亿条记录，而表只包含~3500条。该表上有下一个索引：（id），（type），（type，first），（type，second）。

我们正在使用下一个版本：在x86_64-unknown-linux-gnu上的PostgreSQL 9.4.5，由gcc（GCC）4.4.7 20120编译 313（Red Hat 4.4.7-16），64位

为什么PostgreSQL在这种情况下工作如此奇怪？

Answer 1

测试一下：

select *
from
    some_table as t1
    join
    some_table as t2 on
        t1.type = t2.type
        and
        (
            t1.type = 'ab' and t1.first = t2.first
            or
            t1.type = 'cd' and t1.second = t2.second
        )

为了获得更好的性能，请根据函数创建索引：

create or replace function f (_type text, _first int, _second int)
returns integer as $$
    select case _type when 'ab' then _first else _second end;
$$ language sql immutable;

create index i on some_table(f(type, first, second));

在查询中使用该索引：

select *
from
    some_table as t1
    join
    some_table as t2 on
        t1.type = t2.type
        and
        f(t1.type, t1.first, t1.second) = f(t1.type, t2.first, t2.second)

PostgreSQL使用CASE语句

1 个答案: