分区和索引

时间:2019-02-27 08:55:22

标签: postgresql indexing database-partitioning postgresql-10

我每个季度都有一个分区的表。表名称为data。表格中有几列,还有datedate是创建了索引的字段: create index on data (date); 现在,我正在尝试查询表:

justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';

                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=355709.66..355709.67 rows=1 width=32) (actual time=577.072..577.072 rows=1 loops=1)
   ->  Gather  (cost=355709.44..355709.65 rows=2 width=32) (actual time=577.005..578.418 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=354709.44..354709.45 rows=1 width=32) (actual time=573.255..573.256 rows=1 loops=3)
               ->  Append  (cost=0.42..352031.07 rows=1071346 width=8) (actual time=15.286..524.604 rows=837204 loops=3)
                     ->  Parallel Index Scan using data_date_idx on data  (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=3)
                           Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
                     ->  Parallel Seq Scan on data_y2018q4  (cost=0.00..352022.64 rows=1071345 width=8) (actual time=15.282..465.859 rows=837204 loops=3)
                           Filter: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
                           Rows Removed by Filter: 1479844
 Planning time: 1.437 ms
 Execution time: 578.465 ms
(13 rows)

我们可能会看到有Parallel Seq Scan on data_y2018q4。实际上,这对我来说是正常的。我有四分之一分区。我正在查询整个分区的第三部分,所以我有seq scan,很好。 但是现在让我们直接查询分区表:

justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data_y2018q4 WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';
                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=286475.38..286475.39 rows=1 width=32) (actual time=277.830..277.830 rows=1 loops=1)
   ->  Gather  (cost=286475.16..286475.37 rows=2 width=32) (actual time=277.760..279.194 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=285475.16..285475.17 rows=1 width=32) (actual time=275.950..275.950 rows=1 loops=3)
               ->  Parallel Index Scan using data_y2018q4_date_idx on data_y2018q4  (cost=0.43..282796.80 rows=1071345 width=8) (actual time=0.022..227.687 rows=837204 loops=3)
                     Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
 Planning time: 0.187 ms
 Execution time: 279.233 ms
(9 rows)

现在我有了Index Scan using data_y2018q4_date_idx,整个查询时间也比279.233 ms快了两倍:578.465 ms。这是什么解释?查询data表时如何强制计划程序使用索引扫描。如何实现更好的两倍计时?

0 个答案:

没有答案