Postgres查询不选择具有OR条件的列的索引

时间:2018-05-02 22:54:49

标签: postgresql

当我使用OR条件时,我有一个查询,其中Postgres使用序列扫描执行散列连接,而不是使用嵌套循环执行索引连接。这导致查询花费2秒而不是在< 100毫秒。我已经运行了VACUUM ANALYZE并在PATIENTCHARTNOTE表(大约5GB)上重建了索引,但它仍在使用散列连接。你对我如何改进这个有什么建议吗?

explain analyze
SELECT Count (_pcn.id) AS total_open_note
FROM   patientchartnote _pcn
   INNER JOIN appointment _appt
           ON _appt.id = _pcn.appointment_id
   INNER JOIN patient _pt
           ON _pt.id = _appt.patient_id
   LEFT OUTER JOIN person _ps
                ON _ps.id = _pt.appuser_id
   WHERE  _pcn.active = true
   AND _pt.active = true
   AND _appt.datecomplete IS NULL
   AND _pcn.title IS NOT NULL
   AND _pcn.title <> ''
   AND ( _pt.assigned_to_user_id = '136964'
         OR  _pcn.createdby_id = '136964'
   );


 Aggregate  (cost=237655.59..237655.60 rows=1 width=8) (actual       time=1602.069..1602.069 rows=1 loops=1)
 ->  Hash Join  (cost=83095.43..237645.30 rows=4117 width=4) (actual time=944.850..1602.014 rows=241 loops=1)
 Hash Cond: (_appt.patient_id = _pt.id)
 Join Filter: ((_pt.assigned_to_user_id = 136964) OR (_pcn.createdby_id = 136964))
 Rows Removed by Join Filter: 94036
 ->  Hash Join  (cost=46650.68..182243.64 rows=556034 width=12) (actual time=415.862..1163.812 rows=94457 loops=1)
 Hash Cond: (_pcn.appointment_id = _appt.id)
 ->  Seq Scan on patientchartnote _pcn  (cost=0.00..112794.20 rows=1073978 width=12) (actual time=0.016..423.262 rows=1
073618 loops=1)
Filter: (active AND (title IS NOT NULL) AND ((title)::text <> ''::text))
Rows Removed by Filter: 22488
->  Hash  (cost=35223.61..35223.61 rows=696486 width=8) (actual time=414.749..414.749 rows=692839 loops=1)
Buckets: 131072  Batches: 16  Memory Usage: 2732kB
->  Seq Scan on appointment _appt  (cost=0.00..35223.61 rows=696486 width=8)        (actual time=0.010..271.208 rows=69
2839 loops=1)
Filter: (datecomplete IS NULL)
Rows Removed by Filter: 652426
->  Hash  (cost=24698.57..24698.57 rows=675694 width=12) (actual time=351.566..351.566 rows=674929 loops=1)
Buckets: 131072  Batches: 16  Memory Usage: 2737kB
->  Seq Scan on patient _pt  (cost=0.00..24698.57 rows=675694 width=12) (actual time=0.013..197.268 rows=674929 loops=
1)
Filter: active
Rows Removed by Filter: 17426
Planning time: 1.533 ms
Execution time: 1602.715 ms

当我更换&#34;或_pcn.createdby_id =&#39; 136964&#39;&#34;使用&#34; AND _pcn.createdby_id =&#39; 136964&#39;&#34;,Postgres执行索引扫描

 Aggregate  (cost=29167.56..29167.57 rows=1 width=8) (actual time=937.743..937.743 rows=1 loops=1)
 ->  Nested Loop  (cost=1.28..29167.55 rows=7 width=4) (actual time=19.136..937.669 rows=37 loops=1)
 ->  Nested Loop  (cost=0.85..27393.03 rows=1654 width=4) (actual time=2.154..910.250 rows=1649 loops=1)
 ->  Index Scan using patient_activeassigned_idx on patient _pt  (cost=0.42..3075.00 rows=1644 width=8) (actual time=1.
599..11.820 rows=1627 loops=1)
 Index Cond: ((active = true) AND (assigned_to_user_id = 136964))
 Filter: active
 ->  Index Scan using appointment_datepatient_idx on appointment _appt  (cost=0.43..14.75 rows=4 width=8) (actual time=
 0.543..0.550 rows=1 loops=1627)
 Index Cond: ((patient_id = _pt.id) AND (datecomplete IS NULL))
 ->  Index Scan using patientchartnote_activeappointment_idx on patientchartnote _pcn  (cost=0.43..1.06 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1649)
 Index Cond: ((active = true) AND (createdby_id = 136964) AND (appointment_id = _appt.id) AND (title IS NOT NULL))
 Filter: (active AND ((title)::text <> ''::text))
 Planning time: 1.489 ms
 Execution time: 937.910 ms
 (13 rows)

1 个答案:

答案 0 :(得分:0)

在SQL查询中使用AND通常会导致性能下降。

这是因为 - 与AND不同 - 它不限制,但会扩展查询结果中的行数。使用OR,您可以对条件的一部分使用索引扫描,并在第二个条件下使用过滤器进一步限制结果集。 id无法做到这一点。

所以PostgreSQL只做了剩下的事情:它计算整个连接,然后筛选出与条件不匹配的所有行。当然,当您加入三个表时,这是非常低效的(我没有计算外连接)。

假设所有名为SELECT count(*) FROM (SELECT _pcn.id FROM patientchartnote _pcn INNER JOIN appointment _appt ON _appt.id = _pcn.appointment_id INNER JOIN patient _pt ON _pt.id = _appt.patient_id LEFT OUTER JOIN person _ps ON _ps.id = _pt.appuser_id WHERE _pcn.active = true AND _pt.active = true AND _appt.datecomplete IS NULL AND _pcn.title IS NOT NULL AND _pcn.title <> '' AND _pt.assigned_to_user_id = '136964' UNION SELECT _pcn.id FROM patientchartnote _pcn INNER JOIN appointment _appt ON _appt.id = _pcn.appointment_id INNER JOIN patient _pt ON _pt.id = _appt.patient_id LEFT OUTER JOIN person _ps ON _ps.id = _pt.appuser_id WHERE _pcn.active = true AND _pt.active = true AND _appt.datecomplete IS NULL AND _pcn.title IS NOT NULL AND _pcn.title <> '' AND _pcn.createdby_id = '136964' ) q; 的列都是主键,您可以按如下方式重写查询:

 $norms = DB::table('normas')
        ->leftJoin('rel_nodo_tem','normas.id_norma','=','rel_nodo_tem.id_norma')
        ->leftJoin('tipo_norma', 'normas.id_tipo_norma', '=', 'tipo_norma.id_tipo_norma')
        ->where('rel_nodo_tem.id_tipo_norma', '=', 'normas.id_tipo_norma')
        ->where('rel_nodo_tem.cod_nodo', $codNodo)
        ->select('normas.id_norma','normas.id_tipo_norma','normas.texto_norma as text_norm','normas.fec_norma','normas.desc_norma', 'tipo_norma.desc_tipo_norma', 'rel_nodo_tem.cod_nodo', 'rel_nodo_tem.articulo')
        ->orderBy('tipo_norma.desc_tipo_norma', 'asc')
        ->orderBy('normas.fec_norma', 'desc')
        ->orderBy('normas.id_norma', 'desc')
        ->paginate(20);

即使这是两次运行查询,索引也可以用来尽早过滤掉除了几行之外的所有行,因此这个查询应该表现得更好。