我有一个查询(使用psql针对Postgres(9.6.10)数据库执行),该查询返回10行。选择30列而不是1列时,查询会执行20倍 slower (慢)。
我想我知道为什么会发生这种情况(请参见下面的 EXPLAIN输出)。而且我猜测解决方法是仅选择ID,然后重新加入数据。这是否表明查询计划程序中存在错误?还有其他解决方法吗?
查询1(在20秒内执行)
EXPLAIN ANALYZE SELECT fundraisers.*
FROM fundraisers
INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id
LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id
GROUP BY accounts.id, fundraisers.id
LIMIT 10
查询2(在1秒钟内执行)
仅在所选列中有所不同
EXPLAIN ANALYZE SELECT fundraisers.id
FROM fundraisers
INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id
LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id
GROUP BY accounts.id, fundraisers.id
LIMIT 10
EXPLAIN输出
我注意到的一件事是,在EXPLAIN输出中,我看到哈希联接由于要联接的数据的宽度而具有不同的开销。即。
-> Hash Join (cost=25967.06..109216.83 rows=1359646 width=1634) (actual time=322.987..1971.464 rows=1356192 loops=1)
vs
-> Hash Join (cost=14500.06..74422.83 rows=1359646 width=8) (actual time=111.710..730.736 rows=1356192 loops=1)
更多详细信息
database=# EXPLAIN ANALYZE SELECT fundraisers.*
database-# FROM fundraisers
database-# INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id
database-# LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id
database-# GROUP BY accounts.id, fundraisers.id
database-# LIMIT 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3147608.91..3147608.98 rows=10 width=1634) (actual time=20437.137..20437.190 rows=10 loops=1)
-> Group (cost=3147608.91..3157806.25 rows=1359646 width=1634) (actual time=20437.136..20437.186 rows=10 loops=1)
Group Key: accounts.id, fundraisers.id
-> Sort (cost=3147608.91..3151008.02 rows=1359646 width=1634) (actual time=20437.133..20437.165 rows=120 loops=1)
Sort Key: accounts.id, fundraisers.id
Sort Method: external merge Disk: 1976192kB
-> Hash Join (cost=25967.06..109216.83 rows=1359646 width=1634) (actual time=322.987..1971.464 rows=1356192 loops=1)
Hash Cond: (audit_logs.fundraiser_id = fundraisers.id)
-> Seq Scan on audit_logs (cost=0.00..40634.14 rows=1517914 width=4) (actual time=0.078..324.638 rows=1517915 loops=1)
-> Hash (cost=13794.41..13794.41 rows=56452 width=1634) (actual time=321.869..321.869 rows=56452 loops=1)
Buckets: 4096 Batches: 32 Memory Usage: 2786kB
-> Hash Left Join (cost=1548.76..13794.41 rows=56452 width=1634) (actual time=16.465..122.406 rows=56452 loops=1)
Hash Cond: (fundraisers.account_id = accounts.id)
-> Seq Scan on fundraisers (cost=0.00..11546.52 rows=56452 width=1630) (actual time=0.068..54.434 rows=56452 loops=1)
-> Hash (cost=965.56..965.56 rows=46656 width=4) (actual time=16.337..16.337 rows=46656 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2153kB
-> Seq Scan on accounts (cost=0.00..965.56 rows=46656 width=4) (actual time=0.020..8.268 rows=46656 loops=1)
Planning time: 0.748 ms
Execution time: 21013.427 ms
(19 rows)
database=# EXPLAIN ANALYZE SELECT fundraisers.id
database-# FROM fundraisers
database-# INNER JOIN audit_logs ON audit_logs.fundraiser_id = fundraisers.id
database-# LEFT OUTER JOIN accounts ON accounts.id = fundraisers.account_id
database-# GROUP BY accounts.id, fundraisers.id
database-# LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=231527.41..231527.48 rows=10 width=8) (actual time=1314.884..1314.917 rows=10 loops=1)
-> Group (cost=231527.41..241724.75 rows=1359646 width=8) (actual time=1314.884..1314.914 rows=10 loops=1)
Group Key: accounts.id, fundraisers.id
-> Sort (cost=231527.41..234926.52 rows=1359646 width=8) (actual time=1314.883..1314.901 rows=120 loops=1)
Sort Key: accounts.id, fundraisers.id
Sort Method: external merge Disk: 23840kB
-> Hash Join (cost=14500.06..74422.83 rows=1359646 width=8) (actual time=111.710..730.736 rows=1356192 loops=1)
Hash Cond: (audit_logs.fundraiser_id = fundraisers.id)
-> Seq Scan on audit_logs (cost=0.00..40634.14 rows=1517914 width=4) (actual time=0.062..224.307 rows=1517915 loops=1)
-> Hash (cost=13794.41..13794.41 rows=56452 width=8) (actual time=111.566..111.566 rows=56452 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2687kB
-> Hash Left Join (cost=1548.76..13794.41 rows=56452 width=8) (actual time=17.362..98.257 rows=56452 loops=1)
Hash Cond: (fundraisers.account_id = accounts.id)
-> Seq Scan on fundraisers (cost=0.00..11546.52 rows=56452 width=8) (actual time=0.067..54.676 rows=56452 loops=1)
-> Hash (cost=965.56..965.56 rows=46656 width=4) (actual time=16.524..16.524 rows=46656 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2153kB
-> Seq Scan on accounts (cost=0.00..965.56 rows=46656 width=4) (actual time=0.032..7.804 rows=46656 loops=1)
Planning time: 0.469 ms
Execution time: 1323.349 ms
答案 0 :(得分:1)
第一:
ORDER BY
或GROUP BY
需要对 complete 结果集(仅用于查找前10个结果)进行排序(哈希联接的结果没有隐含顺序) )WORK_MEM
),它将散落到磁盘上WORK_MEM
并溢出到磁盘最后:基准测试和比较无意义的查询毫无意义。整个优化器机制均采用合理的数据模型。没有这个,它只会产生有效的。
答案 1 :(得分:0)
分析中缺少的是排序成本。
正在发生的事情的顺序是:
JOIN
中选择数据(价格昂贵)GROUP BY
做准备。GROUP BY
(非常感谢排序)+ LIMIT
根据要求。我无法获得有关该sort
的文档,因此我假设它的工作原理与我所了解的另一个DBMS:Oracle。
如here所述,服务器有时需要使用硬盘驱动器来执行此操作。
这是一个非常缓慢的操作。
这很可能是您的查询所发生的情况,区别在于postgresql将要写入1个字段(= 1秒执行)或很多(= 20秒执行)。
话虽如此,请记住,您只使用一个测试查询,可能相当于SELECT * FROM fundraisers LIMIT 10
(基于字段的名称,我不确定表的定义)。
对于您想要的(=生产查询)和您键入的(=测试查询)之间的差距,我并不感到震惊,数据库表现得很有趣。