Question

我有一个Postgres（版本9.5.4）表geo，其中包含738,884个具有以下结构的国家地理数据记录：

              Table "public.geo"
   Column    |            Type             | Modifiers | Storage  | Stats target | Description
-------------+-----------------------------+-----------+----------+--------------+-------------
 id          | integer                     | not null  | plain    |              |
 kind        | character varying(255)      |           | extended |              |
 name        | character varying(255)      |           | extended |              |
 is_owner    | integer                     |           | plain    |              |
 path_array  | integer[]                   |           | extended |              |
Indexes:
    "geo_pkey" PRIMARY KEY, btree (id)
    "kind_index" btree (kind)
    "path_array_idx" gin (path_array gin__int_ops)

记录具有kind字段的层次结构：country - ＆gt; province - ＆gt; area - ＆gt; locality。此层次结构存储在path_array字段中，作为祖先数组和行的自我ID。

示例：

17239123    locality    Moscow  1   {17073865,17073877,17073958,17239123}

我已安装intarray扩展名，并为path_array字段添加了正确的索引。

现在我有一堆recids可以有任何种类（从国家到地方），我需要选择所有类型locality的后代（即在{path_array中有任何这些ID的记录{1}}）。

这是我的疑问：

SELECT
    id
FROM geo
WHERE
    kind = 'locality'
    AND is_owner = 1
    AND path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]

这是EXPLAIN ANALYZE输出：

Bitmap Heap Scan on geo  (cost=1418.04..1532.99 rows=8 width=4) (actual time=685.183..723.330 rows=20984 loops=1)
Recheck Cond: ((is_owner = 1) AND (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]))
Filter: ((kind)::text = 'locality'::text)
Rows Removed by Filter: 2037
Heap Blocks: exact=17106
->  BitmapAnd  (cost=1418.04..1418.04 rows=29 width=0) (actual time=681.154..681.154 rows=0 loops=1)
    ->  Bitmap Index Scan on is_owner_index  (cost=0.00..544.24 rows=29309 width=0) (actual time=5.493..5.493 rows=29201 loops=1)
          Index Cond: (is_owner = 1)
    ->  Bitmap Index Scan on path_array_idx  (cost=0.00..873.54 rows=739 width=0) (actual time=667.888..667.888 rows=607440 loops=1)
          Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
Planning time: 0.212 ms
Execution time: 727.370 ms

上面的查询花了大约700毫秒，我相信这很慢。我是对的还是我要求的太多了？

Answer 1

我在path_array和is_owner字段上创建了复杂的索引。

CREATE INDEX path_array_owner_idx ON geo USING gin (path_array gin__int_ops) WHERE is_owner = 1

-------------------------------------------------------------------------------------
     Bitmap Heap Scan on geo  (cost=436.04..550.99 rows=8 width=4) (actual time=30.292..68.778 rows=20984 loops=1)
   Recheck Cond: ((path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]) AND (is_owner = 1))
   Filter: ((kind)::text = 'locality'::text)
   Rows Removed by Filter: 2037
   Heap Blocks: exact=17106
   ->  Bitmap Index Scan on path_array_owner_idx  (cost=0.00..436.04 rows=29 width=0) (actual time=25.923..25.923 rows=23021 loops=1)
         Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
 Planning time: 0.219 ms
 Execution time: 72.956 ms

现在上面的查询耗时70毫秒，这很不错。

使用Postgres inarray字段搜索后代的性能

1 个答案: