Question

我有一个表，我只插入行，永远不会删除。在每个循环中，我插入大约36k行。

我需要从此表中获取行以执行操作。问题在于每个循环，查询性能真的很差。

例如，在循环31上：

explain analyze select exp(least(709,a.value)), a.from, a.to,a.material,a.transport from resultTable a where a.loop=31;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on resultTable a  (cost=36.58..4431.79 rows=2425 width=48) (actual time=7.351..33894.217 rows=34640 loops=1)
   Recheck Cond: (loop = 31)
   ->  Bitmap Index Scan on "resultTable_idx_mo"  (cost=0.00..35.97 rows=2425 width=0) (actual time=4.880..4.880 rows=34640 loops=1)
         Index Cond: (loop = 31)
 Total runtime: 33897.070 ms
(5 rows)

对于循环43：

explain analyze select exp(least(709,a.value)), a.from, a.to,a.material,a.transport from resultTable a where a.loop=43;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on resultTable a  (cost=36.58..4431.79 rows=2425 width=48) (actual time=10.129..125460.445 rows=34640 loops=1)
   Recheck Cond: (loop = 43)
   ->  Bitmap Index Scan on "resultTable_idx_mo"  (cost=0.00..35.97 rows=2425 width=0) (actual time=4.618..4.618 rows=34640 loops=1)
         Index Cond: (loop 43)
 Total runtime: 125463.516 ms
(5 rows)

时间正以指数增长。我在每个循环中都在做VACUUM和REINDEX（我也没试过，但结果是一样的）。

知道如何改善时间吗？

提前致谢。

分区后：

    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=14.47..2686.29 rows=1649 width=48) (actual time=18.562..220124.597 rows=34640 loops=1)
   ->  Append  (cost=14.47..2682.17 rows=1649 width=48) (actual time=5.189..32.743 rows=34640 loops=1)
         ->  Bitmap Heap Scan on resultTable a  (cost=14.47..1655.44 rows=866 width=48) (actual time=0.008..0.008 rows=0 loops=1)
               Recheck Cond: (loop = 60)
               ->  Bitmap Index Scan on "resultTable_idx_mo"  (cost=0.00..14.26 rows=866 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                     Index Cond: (loop = 60)
         ->  Bitmap Heap Scan on result_table_child_70 a  (cost=8.82..1026.73 rows=783 width=48) (actual time=5.181..29.068 rows=34640 loops=1)
               Recheck Cond: (loop = 60)
               ->  Bitmap Index Scan on resultTable_child_70_idx (cost=0.00..8.63 rows=783 width=0) (actual time=4.843..4.843 rows=34640 loops=1)
                     Index Cond: (loop = 60)
 Total runtime: 220128.290 ms

分析表并设置enable_bitmapscan = off（仍然使用分区）：

 Result  (cost=0.00..2761.06 rows=33652 width=389) (actual time=9.714..378389.177 rows=34640 loops=1)
   ->  Append  (cost=0.00..2676.93 rows=33652 width=389) (actual time=0.119..34.065 rows=34640 loops=1)
         ->  Index Scan using "resultTable_idx_mo" on resultTable a  (cost=0.00..12.84 rows=5 width=48) (actual time=0.058..0.058 rows=0 loops=1)
               Index Cond: (loop= 79)
         ->  Index Scan using resultTable_child_80_idx on resultTable_child_80 a  (cost=0.00..2664.10 rows=33647 width=389) (actual time=0.061..30.303 rows=34640 loops=1)
               Index Cond: (loop = 79)
 Total runtime: 378393.671 ms
(7 rows)

Answer 1

如果群集和分区无效，我开始怀疑您的存储系统存在严重问题。堆扫描35K行的10秒钟是几个数量级太慢。您使用的是什么版本的Postgres？你的存储是什么样的？检查你的iostats。

我在一个小型虚拟机（1个分数cpu，1GB内存，NFS磁盘安装）上设置了一个实验，其中Pg 9.0.4创建了你的表和索引，每个注入1000批36000个记录。

insert into r(loop,value,xfrom,xto,material,transport,pk) select 0,0,0,0,0,0,i from generate_series(0,35999) i;
insert into r(loop,value,xfrom,xto,material,transport,pk) select 1,0,0,0,0,0,i from generate_series(36000,71999) i;
...

在任何批次上运行您的选择始终低于40毫秒：

explain analyze select exp(least(709,a.value)), a.xfrom, a.xto, a.material,a.transport from r a where a.loop=31;
 Index Scan using "resultTable_idx_mo" on r a  (cost=0.00..1596.35 rows=37680 width=21) (actual time=0.087..34.038 rows=36000 loops=1)
   Index Cond: (loop = 31)
 Total runtime: 36.332 ms

explain analyze select exp(least(709,a.value)), a.xfrom, a.xto,a.material,a.transport from r a where a.loop=87;
 Index Scan using "resultTable_idx_mo" on r a  (cost=0.00..1421.35 rows=33480 width=21) (actual time=0.105..37.357 rows=36000 loops=1)
   Index Cond: (loop = 87)
 Total runtime: 39.365 ms

请注意我的计划是使用普通的IndexScan而不是BitmapScans，然后是HeapScans。你有没有调整计划优化配置来影响计划？

在您的解释中，我注意到估计的行数远远少于实际行数（1649对34640）。这表示您在表格上有不准确的统计信息。您应该运行ANAYLZE resultTable;

我认为，当这些简单的索引扫描选择花费超过几十毫秒时，你的postgres配置或存储系统中会出现严重错误。运行查询时，此数据库上是否还有其他活动？也许您的结果是与其他查询竞争资源？

Answer 2

群集可能会加速它：

cluster resultTable using resultTable_idx_mo;

但最终的解决方案是通过循环对表进行分区：

http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html

在postgres中表现真的很慢

2 个答案: