Question

我有一个查询（使用psql针对Postgres（9.6.10）数据库执行），该查询返回10行。选择30列而不是1列时，查询会执行20倍 slower （慢）。

我想我知道为什么会发生这种情况（请参见下面的 EXPLAIN输出）。而且我猜测解决方法是仅选择ID，然后重新加入数据。这是否表明查询计划程序中存在错误？还有其他解决方法吗？

查询1（在20秒内执行）

EXPLAIN ANALYZE SELECT fundraisers.*
FROM fundraisers 
INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id 
LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id 
GROUP BY accounts.id, fundraisers.id  
LIMIT 10

查询2（在1秒钟内执行）

仅在所选列中有所不同

EXPLAIN ANALYZE SELECT fundraisers.id
FROM fundraisers 
INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id 
LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id 
GROUP BY accounts.id, fundraisers.id  
LIMIT 10

EXPLAIN输出

我注意到的一件事是，在EXPLAIN输出中，我看到哈希联接由于要联接的数据的宽度而具有不同的开销。即。

->  Hash Join  (cost=25967.06..109216.83 rows=1359646 width=1634) (actual time=322.987..1971.464 rows=1356192 loops=1)

vs

->  Hash Join  (cost=14500.06..74422.83 rows=1359646 width=8) (actual time=111.710..730.736 rows=1356192 loops=1)

更多详细信息

database=# EXPLAIN ANALYZE SELECT fundraisers.*
database-# FROM fundraisers 
database-# INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id 
database-# LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id 
database-# GROUP BY accounts.id, fundraisers.id  
database-# LIMIT 10;
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3147608.91..3147608.98 rows=10 width=1634) (actual time=20437.137..20437.190 rows=10 loops=1)
   ->  Group  (cost=3147608.91..3157806.25 rows=1359646 width=1634) (actual time=20437.136..20437.186 rows=10 loops=1)
         Group Key: accounts.id, fundraisers.id
         ->  Sort  (cost=3147608.91..3151008.02 rows=1359646 width=1634) (actual time=20437.133..20437.165 rows=120 loops=1)
               Sort Key: accounts.id, fundraisers.id
               Sort Method: external merge  Disk: 1976192kB
               ->  Hash Join  (cost=25967.06..109216.83 rows=1359646 width=1634) (actual time=322.987..1971.464 rows=1356192 loops=1)
                     Hash Cond: (audit_logs.fundraiser_id = fundraisers.id)
                     ->  Seq Scan on audit_logs  (cost=0.00..40634.14 rows=1517914 width=4) (actual time=0.078..324.638 rows=1517915 loops=1)
                     ->  Hash  (cost=13794.41..13794.41 rows=56452 width=1634) (actual time=321.869..321.869 rows=56452 loops=1)
                           Buckets: 4096  Batches: 32  Memory Usage: 2786kB
                           ->  Hash Left Join  (cost=1548.76..13794.41 rows=56452 width=1634) (actual time=16.465..122.406 rows=56452 loops=1)
                                 Hash Cond: (fundraisers.account_id = accounts.id)
                                 ->  Seq Scan on fundraisers  (cost=0.00..11546.52 rows=56452 width=1630) (actual time=0.068..54.434 rows=56452 loops=1)
                                 ->  Hash  (cost=965.56..965.56 rows=46656 width=4) (actual time=16.337..16.337 rows=46656 loops=1)
                                       Buckets: 65536  Batches: 1  Memory Usage: 2153kB
                                       ->  Seq Scan on accounts  (cost=0.00..965.56 rows=46656 width=4) (actual time=0.020..8.268 rows=46656 loops=1)

 Planning time: 0.748 ms
 Execution time: 21013.427 ms
(19 rows)

database=# EXPLAIN ANALYZE SELECT fundraisers.id
database-# FROM fundraisers 
database-# INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id 
database-# LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id 
database-# GROUP BY accounts.id, fundraisers.id  
database-# LIMIT 10;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=231527.41..231527.48 rows=10 width=8) (actual time=1314.884..1314.917 rows=10 loops=1)
   ->  Group  (cost=231527.41..241724.75 rows=1359646 width=8) (actual time=1314.884..1314.914 rows=10 loops=1)
         Group Key: accounts.id, fundraisers.id
         ->  Sort  (cost=231527.41..234926.52 rows=1359646 width=8) (actual time=1314.883..1314.901 rows=120 loops=1)
               Sort Key: accounts.id, fundraisers.id
               Sort Method: external merge  Disk: 23840kB
               ->  Hash Join  (cost=14500.06..74422.83 rows=1359646 width=8) (actual time=111.710..730.736 rows=1356192 loops=1)
                     Hash Cond: (audit_logs.fundraiser_id = fundraisers.id)
                     ->  Seq Scan on audit_logs  (cost=0.00..40634.14 rows=1517914 width=4) (actual time=0.062..224.307 rows=1517915 loops=1)
                     ->  Hash  (cost=13794.41..13794.41 rows=56452 width=8) (actual time=111.566..111.566 rows=56452 loops=1)
                           Buckets: 65536  Batches: 1  Memory Usage: 2687kB
                           ->  Hash Left Join  (cost=1548.76..13794.41 rows=56452 width=8) (actual time=17.362..98.257 rows=56452 loops=1)
                                 Hash Cond: (fundraisers.account_id = accounts.id)
                                 ->  Seq Scan on fundraisers  (cost=0.00..11546.52 rows=56452 width=8) (actual time=0.067..54.676 rows=56452 loops=1)
                                 ->  Hash  (cost=965.56..965.56 rows=46656 width=4) (actual time=16.524..16.524 rows=46656 loops=1)
                                       Buckets: 65536  Batches: 1  Memory Usage: 2153kB
                                       ->  Seq Scan on accounts  (cost=0.00..965.56 rows=46656 width=4) (actual time=0.032..7.804 rows=46656 loops=1)
 Planning time: 0.469 ms
 Execution time: 1323.349 ms

Answer 1

第一：

没有键的表没有含义（这是第二种范式的结果）
在这样的表上查询的结果（没有结果）
没有任何结构（PK，FK，二级索引），优化器只有两个选项：嵌套循环（在seqscans上）或hashjoins
hashjoins总是一个不错的选择，有足够的内存
最后一个ORDER BY或GROUP BY需要对 complete 结果集（仅用于查找前10个结果）进行排序（哈希联接的结果没有隐含顺序））
如果哈希表太大（大于WORK_MEM），它将散落到磁盘上
更多列甚至在哈希表中也需要更多空间，因此它们将更快地超过WORK_MEM并溢出到磁盘

最后：基准测试和比较无意义的查询毫无意义。整个优化器机制均采用合理的数据模型。没有这个，它只会产生有效的。

Answer 2

分析中缺少的是排序成本。

正在发生的事情的顺序是：

从表格+ JOIN中选择数据（价格昂贵）
排序数据，为GROUP BY做准备。
GROUP BY（非常感谢排序）+ LIMIT根据要求。

我无法获得有关该sort的文档，因此我假设它的工作原理与我所了解的另一个DBMS：Oracle。

如here所述，服务器有时需要使用硬盘驱动器来执行此操作。
这是一个非常缓慢的操作。

这很可能是您的查询所发生的情况，区别在于postgresql将要写入1个字段（= 1秒执行）或很多（= 20秒执行）。

话虽如此，请记住，您只使用一个测试查询，可能相当于SELECT * FROM fundraisers LIMIT 10（基于字段的名称，我不确定表的定义）。

对于您想要的（=生产查询）和您键入的（=测试查询）之间的差距，我并不感到震惊，数据库表现得很有趣。

Postgres查询速度慢，包含许多列

2 个答案: