为什么添加窗口函数会使此查询变得如此之慢?

时间:2014-09-06 21:25:39

标签: sql performance postgresql window-functions

查询A以微秒为单位执行:

SELECT t1.id
 FROM (SELECT t0.id AS id FROM t0) AS t1
 WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))
 LIMIT 20 OFFSET 0

但是查询B大约需要25秒:

SELECT t1.id, count(*) OVER () AS count
 FROM (SELECT t0.id AS id FROM t0) AS t1
 WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))
 LIMIT 20 OFFSET 0

(差异只是select子句中的一个项 - 窗口聚合)​​

对于A:

EXPLAIN输出如下所示

 Limit  (cost=0.00..1.20 rows=20 width=4)
   ->  Nested Loop Anti Join  (cost=0.00..3449.22 rows=57287 width=4)
         Join Filter: (t2.ph_id = t0.id)
         ->  Seq Scan on t0  (cost=0.00..1323.88 rows=57288 width=4)
         ->  Materialize  (cost=0.00..1266.02 rows=1 width=4)
               ->  Seq Scan on t2  (cost=0.00..1266.01 rows=1 width=4)
                     Filter: ((me_id = 1) AND (rt_id = 4))

对于B:

 Limit  (cost=0.00..1.45 rows=20 width=4)
   ->  WindowAgg  (cost=0.00..4165.31 rows=57287 width=4)
         ->  Nested Loop Anti Join  (cost=0.00..3449.22 rows=57287 width=4)
               Join Filter: (t2.ph_id = t0.id)
               ->  Seq Scan on t0  (cost=0.00..1323.88 rows=57288 width=4)
               ->  Materialize  (cost=0.00..1266.02 rows=1 width=4)
                     ->  Seq Scan on t2  (cost=0.00..1266.01 rows=1 width=4)
                           Filter: ((me_id = 1) AND (rt_id = 4))

我正在添加窗口聚合以获取LIMITing之前的总行数,以便构建分页UI。

3 个答案:

答案 0 :(得分:3)

您的原始查询可以这样写:

SELECT t0.id
FROM t0
WHERE NOT EXISTS (SELECT 1
                  FROM t2
                  WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4
                 )
LIMIT 20 OFFSET 0;

您没有order by,因此查询可以在找到结果集时返回结果。添加窗口功能时:

SELECT t.0.id, count(*) over ()

现在它正在计算结果集中的行数,因此必须生成整个结果集。因此,查询必须生成所有这些行,而不是仅获取前20行。这需要更多时间。

答案 1 :(得分:2)

您可以查看COUN(*)需要多长时间以及执行计划的样子:

SELECT count(*)
FROM (SELECT t0.id AS id FROM t0) AS t1
WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))

这可能会让您知道为什么需要更长时间。

基本上,第一个查询只读取20个符合t0标准的第一个记录,而第二个查询必须生成符合条件的完整记录集以计算它们。

答案 2 :(得分:0)

感谢您提供其他答案,这些答案是正确的,因为计数必须做更多工作,但我从其他来源找到了解决方案。统计数据不是最新的。

运行命令后......:

ANALYZE;

... Postgresql能够选择更合适的查询计划,现在两个查询都运行得非常快。