使用索引不一致的Postgres varchar列

时间:2016-02-03 20:17:33

标签: sql postgresql postgresql-9.3

我有一个查询,当输入参数是给定长度时将使用表的索引,但是当参数是任何其他长度时,它不会。

此查询将正确使用表索引:

EXPLAIN SELECT *
FROM equipment_tests et
INNER JOIN equipments e ON e.id = et.equipment_id
WHERE e.organization_id = '6c93a9b5-cde7-4660-a55a-1ba74b97fc58' 
LIMIT 100;

其查询计划:

     Limit  (cost=0.84..40.75 rows=100 width=294) (actual time=357.878..366.848 rows=100 loops=1)
   Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at, e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
   ->  Merge Join  (cost=0.84..232724.95 rows=583224 width=294) (actual time=357.874..366.647 rows=100 loops=1)
         Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at, e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
         Merge Cond: ((e.id)::text = (et.equipment_id)::text)
         ->  Index Scan using equip_id on public.equipments e  (cost=0.42..14251.87 rows=33030 width=134) (actual time=0.045..0.045 rows=1 loops=1)
               Output: e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
               Filter: ((e.organization_id)::text = '6c93a9b5-cde7-4660-a55a-1ba74b97fc58'::text)
               Rows Removed by Filter: 5
         ->  Index Scan using equip_tests_equip_id on public.equipment_tests et  (cost=0.43..208750.41 rows=1525051 width=160) (actual time=0.005..173.042 rows=73224 loops=1)
               Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at
 Total runtime: 366.989 ms

此查询不会使用equipment_test.equipment_id索引:

EXPLAIN SELECT *                
FROM equipment_tests et
INNER JOIN equipments e ON e.id = et.equipment_id
WHERE e.organization_id = '6c93a9b5-cde7-4660-a55a-1ba74b97fc5'
LIMIT 100;

其查询计划:

     Limit  (cost=50.06..14630.82 rows=100 width=294) (actual time=0.043..0.043 rows=0 loops=1)
   Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at, e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
   ->  Hash Join  (cost=50.06..56623.39 rows=388 width=294) (actual time=0.040..0.040 rows=0 loops=1)
         Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at, e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
         Hash Cond: ((et.equipment_id)::text = (e.id)::text)
         ->  Seq Scan on public.equipment_tests et  (cost=0.00..50850.51 rows=1525051 width=160) (actual time=0.004..0.004 rows=1 loops=1)
               Output: et.id, et.equipment_id, et.test_id, et.equipment_config_id, et.context, et.created_at
         ->  Hash  (cost=49.79..49.79 rows=22 width=134) (actual time=0.027..0.027 rows=0 loops=1)
               Output: e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
               Buckets: 1024  Batches: 1  Memory Usage: 0kB
               ->  Index Scan using equip_organization on public.equipments e  (cost=0.42..49.79 rows=22 width=134) (actual time=0.025..0.025 rows=0 loops=1)
                     Output: e.id, e.organization_id, e.type, e.model, e.serial_number, e.version, e.calibration_date, e.created_at, e.updated_at, e.hw_version
                     Index Cond: ((e.organization_id)::text = '6c93a9b5-cde7-4660-a55a-1ba74b97fc5'::text)
 Total runtime: 0.088 ms

请注意,我所做的只是将organization_id设为参数一个字符缩短。

我们的架构:

                Table "equipment_tests"
       Column        |            Type             | Modifiers 
---------------------+-----------------------------+-----------
 id                  | character varying           | not null
 equipment_id        | character varying           | not null
 test_id             | character varying           | not null
 equipment_config_id | character varying           | not null
 created_at          | timestamp without time zone | not null
Indexes:
    "equipment_tests_pkey" PRIMARY KEY, btree (id)
    "equipment_tests_test_config_context" UNIQUE, btree (test_id, equipment_config_id)
    "equip_tests_equip_id" btree (equipment_id)

                 Table "equipments"
      Column      |            Type             | Modifiers 
------------------+-----------------------------+-----------
 id               | character varying           | not null
 organization_id  | character varying           | not null
 type             | integer                     | 
 model            | character varying           | 
 serial_number    | character varying           | 
 created_at       | timestamp without time zone | not null
 updated_at       | timestamp without time zone | not null
Indexes:
    "equipments_pkey" PRIMARY KEY, btree (id)
    "equip_organization" btree (organization_id)
    "equipment_org_model_sn" btree (organization_id, model, serial_number)

通常,PK是UUID,但是有一些遗留数据,其中ID可以是较短的随机字符集(大约22个随机字母字符)。当我们使用这些ID(比UUID短)查询时,我们是否发现PG没有使用equipment_id索引,而是进行表扫描。

equipment_tests表大约有40M行。设备表大约有1M行。

我们正在使用Postgres 9.3.6

我们认为这可能与以下事实有关:此列中的大多数数据是一个长度,少数数据的长度较短,但我不确定调试的下一步是什么应该是吗?

0 个答案:

没有答案