我有table2
,我按日期范围date_col
和整数列col1
限制了我的结果。对于日期范围,我使用BETWEEN
和IN()
之间有不同的计划。
如果我添加更多天,计划不会改变。
我的主要问题是为什么PostgreSQL在这两个列都有复合索引时决定使用BitmapAnd
?
如果可能的话,我还想知道为什么它会在这两种方法之间给出不同的计划,以便按日期范围限制我的设置,如果我可以设置一些服务器选项以避免这种情况。
EXPLAIN
SELECT date_col, some_col, col1, col2
FROM table2
WHERE date_col BETWEEN '2017-07-10' AND '2017-07-11'
AND col1 = 332
给予(ANALYZE
之前):
Index Scan using table2_date_col_col1_idx on table2 (cost=0.43..341.69 rows=39 width=44)
Index Cond: ((date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date) AND (col1 = 332))
在桌面上使用ANALYZE
后(新生成):
Bitmap Heap Scan on table2 (cost=2252.82..4165.42 rows=498 width=45)
Recheck Cond: ((col1 = 332) AND (date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date))
-> BitmapAnd (cost=2252.82..2252.82 rows=498 width=0)
-> Bitmap Index Scan on table2_col1_idx (cost=0.00..145.95 rows=7670 width=0)
Index Cond: (col1 = 332)
-> Bitmap Index Scan on table2_date_col_idx (cost=0.00..2106.37 rows=100194 width=0)
Index Cond: ((date_col >= '2017-07-10'::date) AND (date_col <= '2017-07-11'::date))
此查询:
EXPLAIN
SELECT date_col, some_col, col1, col2
FROM table2
WHERE date_col IN('2017-07-10', '2017-07-11')
AND col1 = 332
在ANALYZE
之前:
Bitmap Heap Scan on table2 (cost=9.65..317.36 rows=78 width=44)
Recheck Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
-> Bitmap Index Scan on table2_date_col_col1_idx (cost=0.00..9.63 rows=78 width=0)
Index Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
ANALYZE
之后:
Bitmap Heap Scan on table2 (cost=13.96..1925.32 rows=498 width=45)
Recheck Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
-> Bitmap Index Scan on table2_date_col_col1_idx (cost=0.00..13.84 rows=498 width=0)
Index Cond: ((date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (col1 = 332))
有趣的是,在分析之前,两个查询都会对返回的行给出不同的估计值。
表:
CREATE TABLE table2 (id serial PRIMARY KEY, date_col date,
some_col int, col1 int, col2 text);
功能:
CREATE OR REPLACE FUNCTION public.random(integer, integer)
RETURNS integer
LANGUAGE sql
AS $function$
SELECT ($1 + ($2 - $1) * random())::int;
$function$
数据:
INSERT INTO table2 (date_col, some_col, col1, col2)
SELECT date_col::date, random(1000000,7000000), random(200,400), md5(random()::text)
FROM generate_series(1,50000) AS num
CROSS JOIN generate_series('2017-07-01'::date, '2017-07-31'::date, '1 day') AS date_col
索引:
CREATE INDEX ON table2 USING btree(date_col);
CREATE INDEX ON table2 USING btree(col1);
CREATE INDEX ON table2 USING btree(date_col,col1);
WHERE date_col IN()
会更快。 table1
每天的行数为偶数,有31天(2017-07)。 table1
与table2
基本相同,只有更多行(约200米)。
令人不安的是IN()
将继续使用复合索引,尽管统计数据清楚地显示此列只有31个不同的值,并且仅使用所述列上的索引应该更快。这可能不是那么激烈(75s vs 81s),但是在其他表的制作中我看到了计划者决定使用复合索引(date_col,some_random_col)的情况,即使我只在WHERE中给出了日期条件。这使得查询从大约1分钟到大约40分钟。荒谬。我运行分析后得到修复,以便将新添加的日期纳入统计数据(我在WHERE条件中使用的那一天),但它对我来说仍然没有意义。我想我有点偏离主题,但它确实感觉与计划者没有理智地使用统计数据有关。
--WHERE date_col >= '2017-07-10' AND date_col <= '2017-07-11' AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=289814.12..498299.85 rows=60165 width=45) (actual time=2874.749..72574.670 rows=56662 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: ((table1.col1 = 332) AND (table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-11'::date))
Rows Removed by Index Recheck: 64327757
Heap Blocks: exact=32714 lossy=663337
Buffers: shared read=734520
I/O Timings: read=60654.424
-> BitmapAnd (cost=289814.12..289814.12 rows=60165 width=0) (actual time=2854.244..2854.244 rows=0 loops=1)
Buffers: shared read=38469
I/O Timings: read=593.932
-> Bitmap Index Scan on table1_col1_idx (cost=0.00..17510.42 rows=947980 width=0) (actual time=413.941..413.941 rows=876992 loops=1)
Index Cond: (table1.col1 = 332)
Buffers: shared read=2400
I/O Timings: read=73.380
-> Bitmap Index Scan on table1_date_col_idx (cost=0.00..272273.37 rows=12985280 width=0) (actual time=2403.607..2403.607 rows=13200000 loops=1)
Index Cond: ((table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-11'::date))
Buffers: shared read=36069
I/O Timings: read=520.553
Planning time: 76.269 ms
Execution time: 72597.548 ms
--WHERE date_col >= '2017-07-10' AND date_col <= '2017-07-21' AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=17601.80..1723659.99 rows=365510 width=45) (actual time=398.685..73908.546 rows=339677 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: (table1.col1 = 332)
Rows Removed by Index Recheck: 63519900
Filter: ((table1.date_col >= '2017-07-10'::date) AND (table1.date_col <= '2017-07-21'::date))
Rows Removed by Filter: 537315
Heap Blocks: exact=42548 lossy=663337
Buffers: shared read=708285
I/O Timings: read=63032.731
-> Bitmap Index Scan on table1_col1_idx (cost=0.00..17510.42 rows=947980 width=0) (actual time=380.480..380.480 rows=876992 loops=1)
Index Cond: (table1.col1 = 332)
Buffers: shared read=2400
I/O Timings: read=34.086
Planning time: 7.691 ms
Execution time: 73973.701 ms
--WHERE date_col >= '2017-07-01' AND date_col <= '2017-07-31' AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=17747.41..1723805.61 rows=947980 width=45) (actual time=359.913..75058.918 rows=876992 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: (table1.col1 = 332)
Rows Removed by Index Recheck: 63519900
Filter: ((table1.date_col >= '2017-07-01'::date) AND (table1.date_col <= '2017-07-31'::date))
Heap Blocks: exact=42548 lossy=663337
Buffers: shared read=708285
I/O Timings: read=63848.616
-> Bitmap Index Scan on table1_col1_idx (cost=0.00..17510.42 rows=947980 width=0) (actual time=346.336..346.336 rows=876992 loops=1)
Index Cond: (table1.col1 = 332)
Buffers: shared read=2400
I/O Timings: read=34.437
Planning time: 8.258 ms
Execution time: 75179.169 ms
--WHERE date_col IN('2017-07-10', ..) AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=1281.84..209617.15 rows=60165 width=45) (actual time=33.533..7441.378 rows=56662 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (table1.col1 = 332))
Rows Removed by Index Recheck: 3253762
Heap Blocks: exact=22072 lossy=33898
Buffers: shared hit=2 read=56130
I/O Timings: read=6801.373
-> Bitmap Index Scan on table1_date_col_col1_idx (cost=0.00..1266.80 rows=60165 width=0) (actual time=25.858..25.858 rows=56662 loops=1)
Index Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11}'::date[])) AND (table1.col1 = 332))
Buffers: shared hit=2 read=160
I/O Timings: read=5.959
Planning time: 7.957 ms
Execution time: 7450.346 ms
--WHERE date_col IN('2017-07-10', .., '2017-07-21') AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=7785.30..960333.44 rows=365510 width=45) (actual time=171.661..47546.257 rows=339677 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11,I_CUT_HERE,2017-07-21}'::date[])) AND (table1.col1 = 332))
Rows Removed by Index Recheck: 26010391
Heap Blocks: exact=40928 lossy=271210
Buffers: shared hit=20 read=313091
I/O Timings: read=41089.435
-> Bitmap Index Scan on table1_date_col_col1_idx (cost=0.00..7693.92 rows=365510 width=0) (actual time=159.538..159.538 rows=339677 loops=1)
Index Cond: ((table1.date_col = ANY ('{2017-07-10,2017-07-11,I_CUT_HERE,2017-07-21}'::date[])) AND (table1.col1 = 332))
Buffers: shared hit=20 read=953
I/O Timings: read=28.047
Planning time: 7.738 ms
Execution time: 47596.678 ms
--WHERE date_col IN('2017-07-01', .., '2017-07-31') AND col1 = 332
Bitmap Heap Scan on public.table1 (cost=12906.25..1365283.04 rows=604948 width=45) (actual time=445.607..81144.267 rows=876992 loops=1)
Output: date_col, some_col, col1, col2
Recheck Cond: ((table1.date_col = ANY ('{2017-07-01,2017-07-02,I_CUT_HERE,2017-07-30,2017-07-31}'::date[])) AND (table1.col1 = 332))
Rows Removed by Index Recheck: 63777327
Heap Blocks: exact=39804 lossy=666081
Buffers: shared hit=54 read=708350
I/O Timings: read=60703.758
-> Bitmap Index Scan on table1_date_col_col1_idx (cost=0.00..12755.01 rows=604948 width=0) (actual time=430.652..430.652 rows=876992 loops=1)
Index Cond: ((table1.date_col = ANY ('{2017-07-01,2017-07-02,I_CUT_HERE,2017-07-30,2017-07-31}'::date[])) AND (table1.col1 = 332))
Buffers: shared hit=54 read=2465
I/O Timings: read=67.264
Planning time: 9.017 ms
Execution time: 81261.378 ms
table1
与table2
相同,每天只有更多行。
postgres=# SELECT attname, null_frac, avg_width, n_distinct, most_common_vals, correlation FROM pg_stats WHERE tablename = 'table1';
attname | null_frac | avg_width | n_distinct | most_common_vals | correlation
----------+-----------+-----------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
int | 0 | 4 | -1 | | 0.996684
date_col | 0 | 4 | 31 | {2017-07-07,2017-07-03,2017-07-08,2017-07-28,2017-07-04,2017-07-15,2017-07-20,2017-07-18,2017-07-31,2017-07-23,2017-07-14,2017-07-16,2017-07-22,2017-07-25,2017-07-19,2017-07-21,2017-07-01,2017-07-05,2017-07-12,2017-07-26,2017-07-11,2017-07-30,2017-07-06,2017-07-10,2017-07-09,2017-07-24,2017-07-17,2017-07-27,2017-07-29,2017-07-02,2017-07-13} | 0.0346347
some_col | 0 | 4 | 6.58472e+006 | {1096591,1131422,1176742,1205762,1267732,1358307,1793233,1897958,1958800,1979780,2020229,2187352,2222144,2306378,2367818,2445771,2506148,2590445,2600271,2752586,2945764,3024254,3201950,3412218,3530060,3616631,4001881,4033122,4142542,4200890,4216142,4218113,4461939,4486968,4525355,4592945,4704906,4839527,4967659,5055096,5077240,5412984,5455464,5561802,5573389,5648549,5657678,5666687,5782171,5869402,5900299,5953811,6166736,6232273,6249154,6388286,6482146,6525559,6527271,6555494,6682407,6772179,6823587,6936062,6944575,6953775} | -0.00714019
col1 | 0 | 4 | 1665 | {234,211,356,255,325,381,393,266,375,259,278,232,303,334,337,246,249,254,279,317,329,301,319,365,221,256,300,240,285,309,347,201,231,357,399,208,220,223,260,268,269,332,352,270,328,342,367,297,314,219,272,287,324,218,224,267,283,299,321,388,229,242,284,298,302,389,397,233,237,307,358,380,217,222,235,236,247,313,323,366,368,206,213,250,251,282,296,373,245,349,354,355,369,372,248,252,280,322,345,204} | 0.0849631
col2 | 0 | 33 | -1 | | 0.00675247