在WHERE具有OR条件的情况下组合多个索引

时间:2011-03-04 14:52:34

标签: sql postgresql

如果我执行这样的查询:

SELECT eid
FROM   entidades e
WHERE  distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )

SELECT eid
FROM   entidades e
WHERE  concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )

我正在使用distritoconcelho上的索引。

对于上述任何查询,explain analyze的输出将是这样的:

----------------------------------------------------------------------
Nested Loop  (cost=239.36..23453.18 rows=12605 width=4) (actual time=29.995..790.191 rows=100602 loops=1)
  ->  HashAggregate  (cost=1.38..1.39 rows=1 width=12) (actual time=0.081..0.085 rows=1 loops=1)
        ->  Seq Scan on distritos  (cost=0.00..1.38 rows=1 width=12) (actual time=0.058..0.068 rows=1 loops=1)
              Filter: ((distrito_t)::text ~~ '%lisboa%'::text)
  ->  Bitmap Heap Scan on entidades e  (cost=237.98..23294.23 rows=12605 width=7) (actual time=29.892..389.767 rows=100602 loops=1)
        Recheck Cond: (e.distrito = distritos.id)
        ->  Bitmap Index Scan on idx_t_ent_dis  (cost=0.00..234.83 rows=12605 width=0) (actual time=26.787..26.787 rows=100602 loops=1)
              Index Cond: (e.distrito = distritos.id)

但是,对于以下查询,根本不使用索引...

SELECT eid
FROM   entidades e
WHERE  concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )
OR     distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )

----------------------------------------------------------------------
Seq Scan on entidades e  (cost=10.25..34862.71 rows=283623 width=4) (actual time=0.600..761.876 rows=100604 loops=1)
  Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
  SubPlan 1
    ->  Seq Scan on distritos  (cost=0.00..1.38 rows=1 width=12) (actual time=0.083..0.093 rows=1 loops=1)
          Filter: ((distrito_t)::text ~~ '%lisboa%'::text)
  SubPlan 2
    ->  Seq Scan on concelhos  (cost=0.00..8.86 rows=3 width=5) (actual time=0.173..0.258 rows=1 loops=1)
          Filter: ((concelho_t)::text ~~ '%lisboa%'::text)

如何创建将由上一个查询使用的索引?
根据{{​​3}},有可能......
但我可能没有找到正确的东西,因为我根本找不到任何例子......

更新:为两种查询类型添加了解释输出...

3 个答案:

答案 0 :(得分:1)

我没有给你答案,只有一个问题,为什么要使用子查询?

SELECT eid
FROM   entidades e
LEFT JOIN concelho c ON e.concelho = c.id
LEFT JOIN distritos d ON e.distrito = d.id
WHERE  
    concelho_t LIKE '%lisboa%' OR 
    distrito_t LIKE '%lisboa%';

答案 1 :(得分:1)

The documentation您链接到州:“规划人员有时会选择使用简单的索引扫描,即使可以使用其他索引也可以使用”。

where条件包含子查询时,我无法使postgres(8.4)按照您想要的方式运行 - 只有简单的条件,因此它可能只是对该​​功能的限制。

但关键是,尽管优化器会尝试选择最快的执行路径,但它可能不会成功,在某些情况下,您可能需要“鼓励”另一条路径,就像我在“已修改”中所做的那样使用union查询以下内容:

create table distritos(id serial primary key, distrito_t text);
insert into distritos(distrito_t) select 'distrito'||generate_series(1, 10000);

create table concelho(id serial primary key, concelho_t text);
insert into concelho(concelho_t) select 'concelho'||generate_series(1, 10000);

create table entidades( eid serial primary key, 
                        distrito integer not null references distritos, 
                        concelho integer not null references concelho );
insert into entidades(distrito, concelho)
select generate_series(1, 10000), generate_series(1, 10000);

原:

explain analyze 
select eid from entidades
where concelho in (select id from concelho where concelho_t like '%lisboa%')
   or distrito in (select id from distritos where distrito_t like '%lisboa%');

                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Seq Scan on entidades  (cost=299.44..494.94 rows=7275 width=4) (actual time=8.978..8.978 rows=0 loops=1)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   SubPlan 1
     ->  Seq Scan on concelho  (cost=0.00..149.71 rows=2 width=4) (actual time=3.922..3.922 rows=0 loops=1)
           Filter: (concelho_t ~~ '%lisboa%'::text)
   SubPlan 2
     ->  Seq Scan on distritos  (cost=0.00..149.71 rows=2 width=4) (actual time=3.363..3.363 rows=0 loops=1)
           Filter: (distrito_t ~~ '%lisboa%'::text)

改性:

explain analyze
  select eid from entidades 
  where concelho in (select id from concelho where concelho_t like '%lisboa%')
  union
  select eid from entidades
  where distrito in (select id from distritos where distrito_t like '%lisboa%');

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=648.98..650.92 rows=194 width=4) (actual time=5.409..5.409 rows=0 loops=1)
   ->  Append  (cost=149.74..648.50 rows=194 width=4) (actual time=5.399..5.399 rows=0 loops=1)
         ->  Hash Semi Join  (cost=149.74..323.28 rows=97 width=4) (actual time=2.743..2.743 rows=0 loops=1)
               Hash Cond: (stack.entidades.concelho = concelho.id)
               ->  Seq Scan on entidades  (cost=0.00..147.00 rows=9700 width=8) (actual time=0.013..0.013 rows=1 loops=1)
               ->  Hash  (cost=149.71..149.71 rows=2 width=4) (actual time=2.723..2.723 rows=0 loops=1)
                     ->  Seq Scan on concelho  (cost=0.00..149.71 rows=2 width=4) (actual time=2.716..2.716 rows=0 loops=1)
                           Filter: (concelho_t ~~ '%lisboa%'::text)
         ->  Hash Semi Join  (cost=149.74..323.28 rows=97 width=4) (actual time=2.655..2.655 rows=0 loops=1)
               Hash Cond: (stack.entidades.distrito = distritos.id)
               ->  Seq Scan on entidades  (cost=0.00..147.00 rows=9700 width=8) (actual time=0.006..0.006 rows=1 loops=1)
               ->  Hash  (cost=149.71..149.71 rows=2 width=4) (actual time=2.642..2.642 rows=0 loops=1)
                     ->  Seq Scan on distritos  (cost=0.00..149.71 rows=2 width=4) (actual time=2.642..2.642 rows=0 loops=1)
                           Filter: (distrito_t ~~ '%lisboa%'::text)

答案 2 :(得分:1)

尝试:

SELECT eid
FROM   entidades e
WHERE  concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )
UNION
SELECT eid
FROM   entidades e
OR     distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )

但真正的问题是数据库缺乏规范化,国家,地区,县(condado,parroquia),城镇,郊区缺乏层次结构。如果你有这个,那么组织可以属于正确的地方,你不会在地区一级的两个理事会都有“Lisboa”。