ST_DWITHIN不使用GIST或BRIN索引

时间:2018-09-12 06:18:05

标签: postgresql postgis

我正在使用postgis函数ST_DWithin(地理gg1,地理gg2,双精度distance_meters),以查找点是否在距多边形的指定距离内。我正在运行测试以查看查询需要花费多长时间,并且解释说明该查询正在表上运行顺序扫描,而不是使用BRIN或GIST索引。有人可以提出一种优化方法吗?

这是表格-

带有多边形的表1(incident_geog)

CREATE TABLE public.incident_geog
(
    incident_id integer NOT NULL DEFAULT nextval('incident_geog_incident_id_seq'::regclass),
    incident_name character varying(20) COLLATE pg_catalog."default",
    incident_span geography(Polygon,4326),
    CONSTRAINT incident_geog_pkey PRIMARY KEY (incident_id)
)

CREATE INDEX incident_geog_gix
    ON public.incident_geog USING gist
    (incident_span)

具有点和距离的表2(watchzones_geog)

CREATE TABLE public.watchzones_geog
(
    id integer NOT NULL DEFAULT nextval('watchzones_geog_id_seq'::regclass),
    date_created timestamp with time zone DEFAULT now(),
    latitude numeric(10,7) DEFAULT NULL::numeric,
    longitude numeric(10,7) DEFAULT NULL::numeric,
    radius integer,
    "position" geography(Point,4326),
    CONSTRAINT watchzones_geog_pkey PRIMARY KEY (id)
)

CREATE INDEX watchzones_geog_gix
    ON public.watchzones_geog USING gist
    ("position")

带有st_dwithin的SQL

explain select i.incident_id,wz.id from watchzones_geog wz, incident_geog i where ST_DWithin(position,incident_span,wz.radius * 1000);

说明输出:

Nested Loop  (cost=0.26..418436.69 rows=1 width=8)
-> Seq Scan on watchzones_geog wz  (cost=0.00..13408.01 rows=600001 width=40)
 ->  Index Scan using incident_geog_gix on incident_geog i  (cost=0.26..0.67 rows=1 width=292)
        Index Cond: (incident_span && _st_expand(wz."position", ((wz.radius * 1000))::double precision))
        Filter: ((wz."position" && _st_expand(incident_span, ((wz.radius * 1000))::double precision)) AND _st_dwithin(wz."position", incident_span, ((wz.radius * 1000))::double precision, true))

2 个答案:

答案 0 :(得分:1)

您的SQL实际执行的操作是在每个点的指定距离内找到一些多边形。结果incident_geog.incident_idwatchzones_geog.id之间一一对应。因为您在每个点上都进行操作,所以它使用顺序扫描。

我想您想从Polygon开始寻找点。因此,您的SQL需要更改表。

explain select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

我们可以看到:

Nested Loop  (cost=0.27..876.00 rows=1 width=16)
   ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40)
   ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40)
         Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
         Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))

因为您操作每个订单,总会有一个表通过顺序扫描遍历所有记录。这两个SQL的结果没有不同。关键是您开始在哪个表中查找另一个表的顺序。

也许您可以尝试Parallel Query。不要使用Parallel Query

SET parallel_tuple_cost TO 0;
explain analyze select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

Nested Loop  (cost=0.27..876.00 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40) (never executed)
         Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
         Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))
 Planning time: 0.125 ms
 Execution time: 0.028 ms

尝试Parallel Query并将parallel_tuple_cost设置为2:

SET parallel_tuple_cost TO 2;
explain analyze select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

Nested Loop  (cost=0.27..876.00 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
       ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.001..0.001 rows=0 loops=1)
       ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40) (never executed)
             Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
             Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))
     Planning time: 0.103 ms
     Execution time: 0.013 ms

答案 1 :(得分:0)

一些一般要点:

  1. 使用IDENTITY COLUMNS,而不是手动设置序列。
  2. 您不需要DEFAULT null::,可空列的默认值始终为null
  3. 在加载它们后,请确保您VACUUM ANALAYZE两个表。
  4. 不要使用SQL-89,而是写出您的INNER JOIN ... ON

    SELECT i.incident_id,wz.id
    FROM watchzones_geog wz
    INNER JOIN incident_geog i
      ON ST_DWithin(wz.position,i.incident_span,50);
    
  5. 在您的explain analyze中,您的查询中有一个wz.radius * 1000,半径为50。这是什么?如果您静态输入半径,查询seq会扫描吗?

  6. 如果您不在表格上使用经度和纬度,请删除这两列。没有理由将它们存储两次。
  7. 我不会使用varchar(20),而只是使用text,因为它没有长度检查,并且实现方式相同,因此速度更快。