我有一个表,我只插入行,永远不会删除。 在每个循环中,我插入大约36k行。
我需要从此表中获取行以执行操作。 问题在于每个循环,查询性能真的很差。
例如,在循环31上:
explain analyze select exp(least(709,a.value)), a.from, a.to,a.material,a.transport from resultTable a where a.loop=31;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on resultTable a (cost=36.58..4431.79 rows=2425 width=48) (actual time=7.351..33894.217 rows=34640 loops=1)
Recheck Cond: (loop = 31)
-> Bitmap Index Scan on "resultTable_idx_mo" (cost=0.00..35.97 rows=2425 width=0) (actual time=4.880..4.880 rows=34640 loops=1)
Index Cond: (loop = 31)
Total runtime: 33897.070 ms
(5 rows)
对于循环43:
explain analyze select exp(least(709,a.value)), a.from, a.to,a.material,a.transport from resultTable a where a.loop=43;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on resultTable a (cost=36.58..4431.79 rows=2425 width=48) (actual time=10.129..125460.445 rows=34640 loops=1)
Recheck Cond: (loop = 43)
-> Bitmap Index Scan on "resultTable_idx_mo" (cost=0.00..35.97 rows=2425 width=0) (actual time=4.618..4.618 rows=34640 loops=1)
Index Cond: (loop 43)
Total runtime: 125463.516 ms
(5 rows)
时间正以指数增长。 我在每个循环中都在做VACUUM和REINDEX(我也没试过,但结果是一样的)。
知道如何改善时间吗?
提前致谢。
分区后:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=14.47..2686.29 rows=1649 width=48) (actual time=18.562..220124.597 rows=34640 loops=1)
-> Append (cost=14.47..2682.17 rows=1649 width=48) (actual time=5.189..32.743 rows=34640 loops=1)
-> Bitmap Heap Scan on resultTable a (cost=14.47..1655.44 rows=866 width=48) (actual time=0.008..0.008 rows=0 loops=1)
Recheck Cond: (loop = 60)
-> Bitmap Index Scan on "resultTable_idx_mo" (cost=0.00..14.26 rows=866 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (loop = 60)
-> Bitmap Heap Scan on result_table_child_70 a (cost=8.82..1026.73 rows=783 width=48) (actual time=5.181..29.068 rows=34640 loops=1)
Recheck Cond: (loop = 60)
-> Bitmap Index Scan on resultTable_child_70_idx (cost=0.00..8.63 rows=783 width=0) (actual time=4.843..4.843 rows=34640 loops=1)
Index Cond: (loop = 60)
Total runtime: 220128.290 ms
分析表并设置enable_bitmapscan = off(仍然使用分区):
Result (cost=0.00..2761.06 rows=33652 width=389) (actual time=9.714..378389.177 rows=34640 loops=1)
-> Append (cost=0.00..2676.93 rows=33652 width=389) (actual time=0.119..34.065 rows=34640 loops=1)
-> Index Scan using "resultTable_idx_mo" on resultTable a (cost=0.00..12.84 rows=5 width=48) (actual time=0.058..0.058 rows=0 loops=1)
Index Cond: (loop= 79)
-> Index Scan using resultTable_child_80_idx on resultTable_child_80 a (cost=0.00..2664.10 rows=33647 width=389) (actual time=0.061..30.303 rows=34640 loops=1)
Index Cond: (loop = 79)
Total runtime: 378393.671 ms
(7 rows)
答案 0 :(得分:4)
如果群集和分区无效,我开始怀疑您的存储系统存在严重问题。堆扫描35K行的10秒钟是几个数量级太慢。您使用的是什么版本的Postgres?你的存储是什么样的?检查你的iostats。
我在一个小型虚拟机(1个分数cpu,1GB内存,NFS磁盘安装)上设置了一个实验,其中Pg 9.0.4创建了你的表和索引,每个注入1000批36000个记录。
insert into r(loop,value,xfrom,xto,material,transport,pk) select 0,0,0,0,0,0,i from generate_series(0,35999) i;
insert into r(loop,value,xfrom,xto,material,transport,pk) select 1,0,0,0,0,0,i from generate_series(36000,71999) i;
...
在任何批次上运行您的选择始终低于40毫秒:
explain analyze select exp(least(709,a.value)), a.xfrom, a.xto, a.material,a.transport from r a where a.loop=31;
Index Scan using "resultTable_idx_mo" on r a (cost=0.00..1596.35 rows=37680 width=21) (actual time=0.087..34.038 rows=36000 loops=1)
Index Cond: (loop = 31)
Total runtime: 36.332 ms
explain analyze select exp(least(709,a.value)), a.xfrom, a.xto,a.material,a.transport from r a where a.loop=87;
Index Scan using "resultTable_idx_mo" on r a (cost=0.00..1421.35 rows=33480 width=21) (actual time=0.105..37.357 rows=36000 loops=1)
Index Cond: (loop = 87)
Total runtime: 39.365 ms
请注意我的计划是使用普通的IndexScan而不是BitmapScans,然后是HeapScans。你有没有调整计划优化配置来影响计划?
在您的解释中,我注意到估计的行数远远少于实际行数(1649对34640)。这表示您在表格上有不准确的统计信息。您应该运行ANAYLZE resultTable;
我认为,当这些简单的索引扫描选择花费超过几十毫秒时,你的postgres配置或存储系统中会出现严重错误。运行查询时,此数据库上是否还有其他活动?也许您的结果是与其他查询竞争资源?
答案 1 :(得分:2)
群集可能会加速它:
cluster resultTable using resultTable_idx_mo;
但最终的解决方案是通过循环对表进行分区:
http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html