Question

我想知道是否有人可以解释为什么使用CTE而不是临时表运行这么长时间...我基本上是从客户表中删除重复信息（为什么存在重复信息超出了本文的范围）。

这是Postgres 9.5。

CTE版本如下：

with targets as
    (
        select
            id,
            row_number() over(partition by uuid order by created_date desc) as rn
        from
            customer
    )
delete from
    customer
where
    id in
        (
            select
                id
            from
                targets
            where
                rn > 1
        );

我跑了一个多小时后今天早上杀了那个版本。

临时表版本是：

create temp table
    targets
as select
    id,
    row_number() over(partition by uuid order by created_date desc) as rn
from
    customer;

delete from
    customer
where
    id in
        (
            select
                id
            from
                targets
            where
                rn > 1
        );

此版本在大约7秒内完成。

知道可能导致这种情况的原因吗？

Answer 1

CTE较慢，因为它必须不加改变地执行（通过CTE扫描）。

TFM（第7.8.2节）规定： Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is carried only as far as the primary query demands its output.

因此，优化障碍;对于优化器，不允许拆除CTE，即使它会导致更智能的计划具有相同的结果。

CTE解决方案可以重构为连接子查询（类似于问题中的临时表）。在postgres中，加入的子查询通常比EXISTS（）变体更快，如今。

DELETE FROM customer del
USING ( SELECT id
        , row_number() over(partition by uuid order by created_date desc)
                 as rn
        FROM customer
        ) sub
WHERE sub.id = del.id
AND sub.rn > 1
        ;

另一种方法是使用TEMP VIEW。这语法等同于temp table情况，但语义等同于已连接的子查询表单（它们产生完全相同的查询计划，至少在这种情况下）。这是因为Postgres的优化器拆除视图并将其与主查询（ pull-up ）相结合。您可以在PG中看到view作为一种宏。

CREATE TEMP VIEW targets
AS SELECT id
        , row_number() over(partition by uuid ORDER BY created_date DESC) AS rn
FROM customer;

EXPLAIN
DELETE FROM customer
WHERE id IN ( SELECT id
            FROM targets
            WHERE rn > 1
        );

[更新：我错误的是CTE需要始终执行完成，这只是数据修改CTE的情况]

Answer 2

使用CTE可能会导致与使用临时表不同的瓶颈。我不熟悉PostgreSQL如何实现CTE，但它可能在内存中，所以如果你的服务器内存不足并且你的CTE的结果集非常大，那么你可能会遇到问题。我会在运行查询时监视服务器，并尝试找出瓶颈所在。

执行删除的另一种方法可能比您的两种方法都快：

DELETE C
FROM
    Customer C
WHERE
    EXISTS (SELECT * FROM Customer C2 WHERE C2.uuid = C.uuid AND C2.created_date > C.created_date)

这不会处理与created_date完全匹配的情况，但也可以通过将id添加到子查询来解决。

使用CTE删除比使用Postgres中的临时表更慢

2 个答案: