Question

我有一个小表（约40万行），该表由collection_id索引，并包含一个定义了多个GIN索引的JSON列，其中之一位于值tagline.id上。

使用特定的tagline.id获取所有对象的查询有时非常慢：

explain (analyze, buffers)
SELECT "objects_object"."created",
       "objects_object"."modified",
       "objects_object"."_id",
       "objects_object"."id",
       "objects_object"."collection_id",
       "objects_object"."data",
       "objects_object"."search",
       "objects_object"."location"::bytea
FROM "objects_object"
WHERE ("objects_object"."collection_id" IN (3381, 3321, 3312, 3262, 3068, 2684, 2508, 2159, 2158, 2154, 2157, 2156)
  AND (("objects_object"."data" #>> ARRAY['tagline','id']))::float IN ('8')
  AND ("objects_object"."data" -> 'tagline') ? 'id')
ORDER BY "objects_object"."created" DESC,
         "objects_object"."id" ASC
LIMIT 101;                                                                         


    QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.374..5513.399 rows=101 loops=1)
   Buffers: shared hit=4480 read=6261
   ->  Sort  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.372..5513.389 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 247kB
         Buffers: shared hit=4480 read=6261
         ->  Index Scan using index_tagline_id_float_51a27976 on objects_object  (cost=0.42..8.45 rows=1 width=1239) (actual time=943.689..5513.002 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=4480 read=6261
 Planning time: 0.244 ms
 Execution time: 5513.439 ms
(13 rows)

如果执行多次，执行时间将下降至〜5毫秒。

花了这么长时间？为什么在第一次执行之后就减少那么多时间？

我认为这与内存无关，因为默认内存（4MB）比所需内存（247Kb）高得多。

编辑：索引定义：

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'index_tagline_id_float_51a27976'; 
                                                                                                 indexdef                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX index_tagline_id_float_51a27976 ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'objects_object_collection_id_6f1559f5'; 
                                                indexdef                                                 
---------------------------------------------------------------------------------------------------------
 CREATE INDEX objects_object_collection_id_6f1559f5 ON public.objects_object USING btree (collection_id)
(1 row)

编辑：

添加索引test后：

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision), collection_id) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

执行时间减少了，但是缓冲区共享命中，因此不确定这样做是否可以提高性能：

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.260..1721.281 rows=101 loops=1)
   Buffers: shared hit=5460 read=5115
   ->  Sort  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.257..1721.270 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 298kB
         Buffers: shared hit=5460 read=5115
         ->  Index Scan using test on objects_object  (cost=0.42..8.45 rows=1 width=1238) (actual time=1682.637..1720.793 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=5454 read=5115
 Planning time: 238.364 ms
 Execution time: 1762.996 ms
(13 rows)

问题似乎是collection_id应该是索引条件的一部分，而不是过滤条件，这样可以避免从（慢速）数据存储中获取大量数据。

为什么索引不能按预期工作？

更新：显然，参数顺序对查询计划有影响，我将索引重写为：

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree (collection_id, (((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)

现在运行查询，我们可以看到较少的读取记录：

 Limit  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.043..1.059 rows=101 loops=1)
   Buffers: shared hit=101 read=10
   ->  Sort  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.040..1.047 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 304kB
         Buffers: shared hit=101 read=10
         ->  Index Scan using test on objects_object  (cost=0.42..57.14 rows=1 width=1177) (actual time=0.094..0.670 rows=232 loops=1)
               Index Cond: ((collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[])) AND (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precisio
n))
               Buffers: shared hit=95 read=10
 Planning time: 416.365 ms
 Execution time: 43.463 ms
(11 rows)

Answer 1

可以使用以下索引加快此特定查询的速度：

CREATE INDEX ON public.objects_object (
   ((data #>> ARRAY['tagline'::text, 'id'::text])::double precision),
   collection_id
) WHERE (data -> 'tagline') ? 'id';

这将避免在索引扫描中花费大量时间的过滤器。

Postgres索引扫描效果不佳

1 个答案: