PostgreSQL使用BitmapAnd而不是复合索引

时间:2017-09-26 13:30:19

标签: postgresql performance

我有table2,我按日期范围date_col和整数列col1限制了我的结果。对于日期范围,我使用BETWEENIN()之间有不同的计划。 如果我添加更多天,计划不会改变。

我的主要问题是为什么PostgreSQL在这两个列都有复合索引时决定使用BitmapAnd

如果可能的话,我还想知道为什么它会在这两种方法之间给出不同的计划,以便按日期范围限制我的设置,如果我可以设置一些服务器选项以避免这种情况。

测试:

EXPLAIN
SELECT date_col, some_col, col1, col2
  FROM table2
 WHERE date_col BETWEEN '2017-07-10' AND '2017-07-11'
   AND col1 = 332

给予(ANALYZE之前):

Index Scan using table2_date_col_col1_idx on table2  (cost=0.43..341.69 rows=39 width=44)
  Index Cond: ((date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date) AND (col1 = 332))

在桌面上使用ANALYZE后(新生成):

Bitmap Heap Scan on table2  (cost=2252.82..4165.42 rows=498 width=45)
  Recheck Cond: ((col1 = 332) AND (date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date))
  ->  BitmapAnd  (cost=2252.82..2252.82 rows=498 width=0)
        ->  Bitmap Index Scan on table2_col1_idx  (cost=0.00..145.95 rows=7670 width=0)
              Index Cond: (col1 = 332)
        ->  Bitmap Index Scan on table2_date_col_idx  (cost=0.00..2106.37 rows=100194 width=0)
              Index Cond: ((date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date))

此查询:

EXPLAIN
SELECT date_col, some_col, col1, col2
  FROM table2
 WHERE date_col IN('2017-07-10', '2017-07-11')
   AND col1 = 332

ANALYZE之前:

Bitmap Heap Scan on table2  (cost=9.65..317.36 rows=78 width=44)
  Recheck Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
  ->  Bitmap Index Scan on table2_date_col_col1_idx  (cost=0.00..9.63 rows=78 width=0)
        Index Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))

ANALYZE之后:

Bitmap Heap Scan on table2  (cost=13.96..1925.32 rows=498 width=45)
  Recheck Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
  ->  Bitmap Index Scan on table2_date_col_col1_idx  (cost=0.00..13.84 rows=498 width=0)
        Index Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))

有趣的是,在分析之前,两个查询都会对返回的行给出不同的估计值。

架构+数据:

表:

CREATE TABLE table2 (id serial PRIMARY KEY, date_col date,
                     some_col int, col1 int, col2 text);

功能:

CREATE OR REPLACE FUNCTION public.random(integer, integer)
 RETURNS integer
 LANGUAGE sql
AS $function$
   SELECT ($1 + ($2 - $1) * random())::int;
$function$

数据:

INSERT INTO table2 (date_col, some_col, col1, col2)
SELECT date_col::date, random(1000000,7000000), random(200,400), md5(random()::text)
  FROM generate_series(1,50000) AS num
  CROSS JOIN generate_series('2017-07-01'::date, '2017-07-31'::date, '1 day') AS date_col

索引:

CREATE INDEX ON table2 USING btree(date_col);
CREATE INDEX ON table2 USING btree(col1);
CREATE INDEX ON table2 USING btree(date_col,col1);

编辑 - 添加解释分析

当使用的日期更少时,

WHERE date_col IN()会更快。 table1每天的行数为偶数,有31天(2017-07)。 table1table2基本相同,只有更多行(约200米)。

令人不安的是IN()将继续使用复合索引,尽管统计数据清楚地显示此列只有31个不同的值,并且仅使用所述列上的索引应该更快。这可能不是那么激烈(75s vs 81s),但是在其他表的制作中我看到了计划者决定使用复合索引(date_col,some_random_col)的情况,即使我只在WHERE中给出了日期条件。这使得查询从大约1分钟到大约40分钟。荒谬。我运行分析后得到修复,以便将新添加的日期纳入统计数据(我在WHERE条件中使用的那一天),但它对我来说仍然没有意义。我想我有点偏离主题,但它确实感觉与计划者没有理智地使用统计数据有关。

--WHERE date_col >= '2017-07-10' AND date_col <= '2017-07-11' AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=289814.12..498299.85 rows=60165 width=45) (actual time=2874.749..72574.670 rows=56662 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: ((table1.col1 = 332) AND (table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-11'::date))
  Rows Removed by Index Recheck: 64327757
  Heap Blocks: exact=32714 lossy=663337
  Buffers: shared read=734520
  I/O Timings: read=60654.424
  ->  BitmapAnd  (cost=289814.12..289814.12 rows=60165 width=0) (actual time=2854.244..2854.244 rows=0 loops=1)
        Buffers: shared read=38469
        I/O Timings: read=593.932
        ->  Bitmap Index Scan on table1_col1_idx  (cost=0.00..17510.42 rows=947980 width=0) (actual time=413.941..413.941 rows=876992 loops=1)
              Index Cond: (table1.col1 = 332)
              Buffers: shared read=2400
              I/O Timings: read=73.380
        ->  Bitmap Index Scan on table1_date_col_idx  (cost=0.00..272273.37 rows=12985280 width=0) (actual time=2403.607..2403.607 rows=13200000 loops=1)
              Index Cond: ((table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-11'::date))
              Buffers: shared read=36069
              I/O Timings: read=520.553
Planning time: 76.269 ms
Execution time: 72597.548 ms

--WHERE date_col >= '2017-07-10' AND date_col <= '2017-07-21' AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=17601.80..1723659.99 rows=365510 width=45) (actual time=398.685..73908.546 rows=339677 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: (table1.col1 = 332)
  Rows Removed by Index Recheck: 63519900
  Filter: ((table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-21'::date))
  Rows Removed by Filter: 537315
  Heap Blocks: exact=42548 lossy=663337
  Buffers: shared read=708285
  I/O Timings: read=63032.731
  ->  Bitmap Index Scan on table1_col1_idx  (cost=0.00..17510.42 rows=947980 width=0) (actual time=380.480..380.480 rows=876992 loops=1)
        Index Cond: (table1.col1 = 332)
        Buffers: shared read=2400
        I/O Timings: read=34.086
Planning time: 7.691 ms
Execution time: 73973.701 ms

--WHERE date_col >= '2017-07-01' AND date_col <= '2017-07-31' AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=17747.41..1723805.61 rows=947980 width=45) (actual time=359.913..75058.918 rows=876992 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: (table1.col1 = 332)
  Rows Removed by Index Recheck: 63519900
  Filter: ((table1.date_col >= '2017-07-01'::date) AND (table1.date_col <= '2017-07-31'::date))
  Heap Blocks: exact=42548 lossy=663337
  Buffers: shared read=708285
  I/O Timings: read=63848.616
  ->  Bitmap Index Scan on table1_col1_idx  (cost=0.00..17510.42 rows=947980 width=0) (actual time=346.336..346.336 rows=876992 loops=1)
        Index Cond: (table1.col1 = 332)
        Buffers: shared read=2400
        I/O Timings: read=34.437
Planning time: 8.258 ms
Execution time: 75179.169 ms



--WHERE date_col IN('2017-07-10', ..) AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=1281.84..209617.15 rows=60165 width=45) (actual time=33.533..7441.378 rows=56662 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (table1.col1 = 332))
  Rows Removed by Index Recheck: 3253762
  Heap Blocks: exact=22072 lossy=33898
  Buffers: shared hit=2 read=56130
  I/O Timings: read=6801.373
  ->  Bitmap Index Scan on table1_date_col_col1_idx  (cost=0.00..1266.80 rows=60165 width=0) (actual time=25.858..25.858 rows=56662 loops=1)
        Index Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (table1.col1 = 332))
        Buffers: shared hit=2 read=160
        I/O Timings: read=5.959
Planning time: 7.957 ms
Execution time: 7450.346 ms

--WHERE date_col IN('2017-07-10', .., '2017-07-21') AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=7785.30..960333.44 rows=365510 width=45) (actual time=171.661..47546.257 rows=339677 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11,I_CUT_HERE,2017-07-21}'::date[])) AND (table1.col1 = 332))
  Rows Removed by Index Recheck: 26010391
  Heap Blocks: exact=40928 lossy=271210
  Buffers: shared hit=20 read=313091
  I/O Timings: read=41089.435
  ->  Bitmap Index Scan on table1_date_col_col1_idx  (cost=0.00..7693.92 rows=365510 width=0) (actual time=159.538..159.538 rows=339677 loops=1)
        Index Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11,I_CUT_HERE,2017-07-21}'::date[])) AND (table1.col1 = 332))
        Buffers: shared hit=20 read=953
        I/O Timings: read=28.047
Planning time: 7.738 ms
Execution time: 47596.678 ms

--WHERE date_col IN('2017-07-01', .., '2017-07-31') AND col1 = 332
Bitmap Heap Scan on public.table1  (cost=12906.25..1365283.04 rows=604948 width=45) (actual time=445.607..81144.267 rows=876992 loops=1)
  Output: date_col, some_col, col1, col2
  Recheck Cond: ((table1.date_col = ANY ('{2017-07-01,2017-07-02,I_CUT_HERE,2017-07-30,2017-07-31}'::date[])) AND (table1.col1 = 332))
  Rows Removed by Index Recheck: 63777327
  Heap Blocks: exact=39804 lossy=666081
  Buffers: shared hit=54 read=708350
  I/O Timings: read=60703.758
  ->  Bitmap Index Scan on table1_date_col_col1_idx  (cost=0.00..12755.01 rows=604948 width=0) (actual time=430.652..430.652 rows=876992 loops=1)
        Index Cond: ((table1.date_col = ANY ('{2017-07-01,2017-07-02,I_CUT_HERE,2017-07-30,2017-07-31}'::date[])) AND (table1.col1 = 332))
        Buffers: shared hit=54 read=2465
        I/O Timings: read=67.264
Planning time: 9.017 ms
Execution time: 81261.378 ms

统计

table1table2相同,每天只有更多行。

postgres=# SELECT attname, null_frac, avg_width, n_distinct, most_common_vals, correlation FROM pg_stats WHERE tablename = 'table1';
 attname  | null_frac | avg_width |  n_distinct  | most_common_vals | correlation
----------+-----------+-----------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 int      |         0 |         4 |           -1 |                  |    0.996684
 date_col |         0 |         4 |           31 | {2017-07-07,2017-07-03,2017-07-08,2017-07-28,2017-07-04,2017-07-15,2017-07-20,2017-07-18,2017-07-31,2017-07-23,2017-07-14,2017-07-16,2017-07-22,2017-07-25,2017-07-19,2017-07-21,2017-07-01,2017-07-05,2017-07-12,2017-07-26,2017-07-11,2017-07-30,2017-07-06,2017-07-10,2017-07-09,2017-07-24,2017-07-17,2017-07-27,2017-07-29,2017-07-02,2017-07-13} |   0.0346347
 some_col |         0 |         4 | 6.58472e+006 | {1096591,1131422,1176742,1205762,1267732,1358307,1793233,1897958,1958800,1979780,2020229,2187352,2222144,2306378,2367818,2445771,2506148,2590445,2600271,2752586,2945764,3024254,3201950,3412218,3530060,3616631,4001881,4033122,4142542,4200890,4216142,4218113,4461939,4486968,4525355,4592945,4704906,4839527,4967659,5055096,5077240,5412984,5455464,5561802,5573389,5648549,5657678,5666687,5782171,5869402,5900299,5953811,6166736,6232273,6249154,6388286,6482146,6525559,6527271,6555494,6682407,6772179,6823587,6936062,6944575,6953775} | -0.00714019
 col1     |         0 |         4 |         1665 | {234,211,356,255,325,381,393,266,375,259,278,232,303,334,337,246,249,254,279,317,329,301,319,365,221,256,300,240,285,309,347,201,231,357,399,208,220,223,260,268,269,332,352,270,328,342,367,297,314,219,272,287,324,218,224,267,283,299,321,388,229,242,284,298,302,389,397,233,237,307,358,380,217,222,235,236,247,313,323,366,368,206,213,250,251,282,296,373,245,349,354,355,369,372,248,252,280,322,345,204} |   0.0849631
 col2     |         0 |        33 |           -1 |                  |  0.00675247

0 个答案:

没有答案