这似乎是一个简单的问题,但我无法在线找到答案。
我正在使用Postgres 9.4并具有此表:
Table "public.title"
Column | Type | Collation | Nullable | Default
---------------------------------+-------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('title_id_seq'::regclass)
name1 | character varying(1000) | | |
name2 | character varying(1000) | | |
name3 | character varying(1000) | | |
name4 | character varying(1000) | | |
我有一个多列索引:
"idx_title_names" btree (name1, name2, name3, name4)
但对于OR查询,则不使用索引:
EXPLAIN ANALYZE SELECT * FROM "title" WHERE ("title"."name1" = 'foo'
OR "title"."name3" = 'foo' OR "title"."name3" = 'foo' OR "title"."name4" = 'foo');
Gather (cost=1000.00..436451.46 rows=659 width=4500) (actual time=561.418..1297.877 rows=3222 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on title (cost=0.00..435385.56 rows=275 width=4500) (actual time=551.627..1286.724 rows=1074 loops=3)
Filter: (((name1)::text = 'foo'::text) OR ((name2)::text = 'foo'::text) OR ((name3)::text = 'foo'::text) OR ((name4)::text = 'foo'::text))
Rows Removed by Filter: 1231911
Planning Time: 0.102 ms
Execution Time: 1298.148 ms
这是因为这些索引不适用于OR查询吗?
而且:如果是这样,我最好的选择就是创建4个独立的标准索引吗?
答案 0 :(得分:2)
一种选择是在列的数组上创建GIN索引,然后使用数组运算符:
create index on title using gin (array[name1,name2,name3,name4]);
然后使用
SELECT *
FROM title
WHERE array[name1,name2,name3,name4] @> array['foo'];
请注意,与BTree索引相比,维护GIN索引的成本更高。
答案 1 :(得分:1)
OR
is often a performance problem在SQL中。
该索引不能用于这样的条件。
您最好的选择是创建四个单列索引,并希望有一个 Bitmap Or :
CREATE INDEX ON public.title (name1);
CREATE INDEX ON public.title (name2);
CREATE INDEX ON public.title (name3);
CREATE INDEX ON public.title (name4);
答案 2 :(得分:1)
在(col1, col2, col3, etc)
上具有索引,它将用于col1
或col1
和col2
或col1
,{{1}上的条件/排序}和col2
等。例如,仅在col3
上不会用于条件/排序。
看看这个:
col3
但是,当您使用# create table t as select random() as a, random() as b from generate_series(1,1000000);
# create index i on t(a,b);
# analyze t;
# explain analyze select * from t where a > 0.9;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=2246.83..8863.15 rows=96826 width=16) (actual time=10.973..28.023 rows=99311 loops=1)
Recheck Cond: (a > '0.9'::double precision)
Heap Blocks: exact=5406
-> Bitmap Index Scan on i (cost=0.00..2222.62 rows=96826 width=0) (actual time=10.251..10.252 rows=99311 loops=1)
Index Cond: (a > '0.9'::double precision)
Planning Time: 0.348 ms
Execution Time: 31.054 ms
# explain analyze select * from t where b > 0.9;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..17906.00 rows=99117 width=16) (actual time=0.015..70.505 rows=100137 loops=1)
Filter: (b > '0.9'::double precision)
Rows Removed by Filter: 899863
Planning Time: 0.090 ms
Execution Time: 73.656 ms
条件时,DBMS实际上应该执行几个查询,因为我们的示例or
等于select * from t where a > 0.9 or b > 0.9
(可以使用索引)和{{1} }(无法使用索引),因此DBMS仅执行一个操作(扫描整个表),而不是两个操作(扫描索引然后扫描整个表)
希望它解释了为什么索引不用于查询的原因。