Question

我在PostGis中还很陌生，我正在使用它来执行地理空间查询，但是返回所需结果的速度似乎很慢。

在我使用python脚本之前，该脚本通常在大约5秒钟内返回结果（搜索超过1.2M的元素）。

为了更快地获得此结果，我在postgis上解决了这个问题，但是，正如我之前写的那样，同一项工作需要花费20秒钟以上的时间。

更准确地说，每个元素都是由一个点（纬度）和一个字符串（该点的标签）组成的

我在我的I7 16 GB内存（ubuntu 18.04）上使用了dockerized postgis（https://hub.docker.com/r/mdillon/postgis）

我通过以下方式创建了数据库：

CREATE DATABASE demo;
\c demo
create extension postgis;
CREATE TABLE mypoints ( id serial primary key, name varchar(50), the_geom geometry(POINT,4326) );

并以此方式通过python脚本插入点（1,2M）

INSERT INTO cities (the_geom, name) VALUES (ST_GeomFromText('POINT(-3.782 40.4351)',4326), 'point_label');

我使用的查询是：

select name from cities where ST_Distance_Sphere(the_geom,ST_GeomFromText('POINT(-3.713 40.4321)',4326))<500;

我做错什么了吗？我的python代码怎么可能比针对地理空间问题优化的查询更快？

Answer 1

您尚未充分利用PostGis，因为尚未使用空间索引。

要在表中创建索引：

create index my_index_points_gist on mypoints using gist(the_geom);

然后在您的表上运行cluster和analyze：

cluster mypoints using my_index_points_gist;
analyze mypoints;

我看到您使用的是球面距离，那么最好使用地理类型：

CREATE TABLE mypoints ( id serial primary key, name varchar(50), geog geography );

以通常的方式插入数据，将转换添加到地理类型：

INSERT INTO cities (geog, name) VALUES (ST_GeomFromText('POINT(-3.782 40.4351)',4326)::geography, 'point_label');

或者，只需添加一个额外的地理列：

alter table mypoints add column geog::geography;
update table mypoints set geog = the_geom::geography;

创建索引，但是这次使用geog

create index my_index_points_gist_geog on mypoints using gist(geog);
cluster mypoints using my_index_points_gist_geog;
analyze mypoints;

对于查询，您可以使用：

select name from cities
where ST_Distance(geog,ST_GeomFromText('POINT(-3.713 40.4321)',4326)::geog)<500;

甚至更好：

select name from cities
where ST_DWITHIN(geog,ST_GeomFromText('POINT(-3.713 40.4321)',4326)::geog,500);

供参考： postgis geography type

PostGis地理空间查询的速度比自定义python脚本慢

1 个答案: