我必须通过调整基本的PostgreSQL服务器配置参数来优化查询。在文档中,我遇到了work_mem
参数。然后我检查了如何更改此参数会影响我的查询的性能(使用sort)。我用各种work_mem
设置测量了查询执行时间,非常失望。
我执行查询的表包含10,000,000行,有430 MB的数据要排序。 (Sort Method: external merge Disk: 430112kB
)。
使用work_mem = 1MB
,EXPLAIN
输出为:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
使用work_mem = 5MB
:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
使用work_mem = 64MB
:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
任何人都能解释为什么性能会变差吗?或者建议任何其他方法通过更改服务器参数来更快地执行查询?
我的查询(我知道它不是最优的,但我必须对此类查询进行基准测试):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
完整的执行计划:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
答案 0 :(得分:13)
我在explain.depesz.com, have a look上发布了您的查询计划。
查询计划程序的估计在某些地方非常错误。
你最近跑过ANALYZE
吗?
阅读Statistics Used by the Planner和Planner Cost Constants手册中的章节。请特别注意random_page_cost
和default_statistics_target
上的章节
你可以试试:
ALTER TABLE diplomas ALTER COLUMN number SET STATISTICS 1000;
ANALYZE diplomas;
或者对于具有10M行的表来说甚至更高。这取决于数据分布和实际查询。实验。默认值为100,最大值为10000。
对于该大小的数据库,通常只有1或5 MB的work_mem
是不够的。阅读@aleroot链接到的Postgres Wiki page on Tuning Postgres。
根据EXPLAIN
输出,您的查询需要 430104kB的磁盘内存,您必须将work_mem
设置为 500MB 或更多允许内存中排序。内存中的数据表示比磁盘上的表示需要更多的空间。您可能对Tom Lane posted on that matter recently。
将work_mem
增加一点,就像你尝试过一样,不会有太大帮助,甚至可能会减慢速度。将其设置为全局甚至可能会受到影响,尤其是在并发访问时。多个会话可能会互相饿死资源。如果资源有限,为一个目的分配更多内容会从另一个目的中分离内存。最佳设置取决于完整的情况。
为避免副作用,只在会话中本地设置足够高,暂时为查询设置:
SET work_mem = '500MB';
之后将其重置为默认值:
RESET work_mem;
或者使用SET LOCAL
仅为当前事务设置它。
答案 1 :(得分:1)
SET search_path='tmp';
-- Generate some data ...
-- DROP table tmp.table_name ;
-- CREATE table tmp.table_name ( n INTEGER NOT NULL PRIMARY KEY);
-- INSERT INTO tmp.table_name(n) SELECT generate_series(1,1000);
-- DELETE FROM tmp.table_name WHERE random() < 0.05 ;
除了查询等同于以下 NOT EXISTS 形式,它在这里生成不同的查询计划(但结果相同)(9.0.1beta something)
-- EXPLAIN ANALYZE
WITH q1 AS (
SELECT 1+tn.n AS n
FROM table_name tn
WHERE NOT EXISTS (
SELECT * FROM table_name nx
WHERE nx.n = tn.n+1
)
)
SELECT q1.n
FROM q1
ORDER BY q1.n DESC;
(也可能是具有递归CTE的版本: - )
编辑:查询计划。全部为100K记录,删除率为0.2%
原始查询:
------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=36461.76..36711.20 rows=99778 width=4) (actual time=2682.600..2682.917 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
-> Subquery Scan q1 (cost=24984.41..26979.97 rows=99778 width=4) (actual time=2003.047..2682.036 rows=222 loops=1)
-> SetOp Except (cost=24984.41..25982.19 rows=99778 width=4) (actual time=2003.042..2681.389 rows=222 loops=1)
-> Sort (cost=24984.41..25483.30 rows=199556 width=4) (actual time=2002.584..2368.963 rows=199556 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 3512kB
-> Append (cost=0.00..5026.57 rows=199556 width=4) (actual time=0.071..1452.838 rows=199556 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..2638.01 rows=99778 width=4) (actual time=0.067..470.652 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1640.22 rows=99778 width=4) (actual time=0.063..178.365 rows=99778 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..2388.56 rows=99778 width=4) (actual time=0.014..429.224 rows=99778 loops=1)
-> Seq Scan on table_name (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.011..143.320 rows=99778 loops=1)
Total runtime: 2684.840 ms
(14 rows)
NOT EXISTS-with CTE:
----------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.60..6394.60 rows=1 width=4) (actual time=699.190..699.498 rows=222 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 22kB
CTE q1
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=312.262..697.985 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.013..143.210 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=309.923..309.923 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..144.102 rows=99778 loops=1)
-> CTE Scan on q1 (cost=0.00..0.02 rows=1 width=4) (actual time=312.270..698.742 rows=222 loops=1)
Total runtime: 700.040 ms
(11 rows)
NOT EXISTS-without without CTE
--------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=6394.58..6394.58 rows=1 width=4) (actual time=692.313..692.625 rows=222 loops=1)
Sort Key: ((1 + tn.n))
Sort Method: quicksort Memory: 22kB
-> Hash Anti Join (cost=2980.01..6394.57 rows=1 width=4) (actual time=308.046..691.849 rows=222 loops=1)
Hash Cond: ((tn.n + 1) = nx.n)
-> Seq Scan on table_name tn (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.014..142.781 rows=99778 loops=1)
-> Hash (cost=1390.78..1390.78 rows=99778 width=4) (actual time=305.732..305.732 rows=99778 loops=1)
-> Seq Scan on table_name nx (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..143.783 rows=99778 loops=1)
Total runtime: 693.139 ms
(9 rows)
我的结论是“NOT EXISTS”版本会导致postgres产生更好的计划。