我正在尝试使用PostGIS查找事件(多边形)和监视区(圆-点和半径)之间的交点。基线数据将超过1万个多边形和50万个圆。另外,我对PostGIS还是很陌生。
我尝试了一些事情,但是执行花费了很长时间。有人可以建议仅使用PostGIS进行任何优化或更好的方法。这是我尝试过的-
1。使用几何数据类型: 我已将事件和监视区存储在几何类型中。 在它们上创建GIST索引,然后使用ST_DWITHIN查找交点。
具有1个事件和500,000个监视区的输出大约花费了6.750sec。在这里,花费的时间是最佳的,但是问题是我的半径以米为单位,并且几何类型为ST_DWithin要求它使用SRID单位。我无法弄清楚这种转换。
CREATE TABLE incident (
incident_id SERIAL NOT NULL,
incident_name VARCHAR(20),
incident_span GEOMETRY(POLYGON, 4326),
CONSTRAINT incident_id PRIMARY KEY (incident_id)
);
CREATE TABLE watchzones (
id SERIAL NOT NULL,
date_created timestamp with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
Longitude NUMERIC(10, 7) DEFAULT NULL,
radius integer,
position GEOMETRY(POINT, 4326),
CONSTRAINT id PRIMARY KEY (id)
);
CREATE INDEX ix_spatial_geom on watchzones using gist(position);
CREATE INDEX ix_spatial_geom_1 on incident using gist(incident_span);
Insert into incident values (
1,
'test',
ST_GeomFromText('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))',
4326
)
);
insert into watchzones
SELECT generate_series(1, 500000) AS id,
now(),
-29.21073,
152.93322,
'50',
ST_GeomFromText('POINT( 152.93322 -29.21073)', 4326);
explain analyze SELECT wz.id,
i.incident_id
FROM watchzones wz,
incident i
WHERE ST_DWithin(incident_span,position,wz.radius);
"Nested Loop (cost=0.14..227467.00 rows=42 width=8) (actual time=0.142..1506.476 rows=500000 loops=1)"
" -> Seq Scan on watchzones wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.109..47.822 rows=500000 loops=1)"
" -> Index Scan using ix_spatial_geom_1 on incident i (cost=0.14..0.42 rows=1 width=284) (actual time=0.002..0.002 rows=1 loops=500000)"
" Index Cond: (incident_span && st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(incident_span, wz."position", (wz.radius)::double precision))"
"Planning time: 0.150 ms"
"Execution time: 1523.312 ms"
2。使用地理位置数据类型:
此处有1个事件和50万个监视区的输出大约需要29.987sec,。请注意,我已经使用GIST和BRIN索引进行了尝试,并且还在表上运行了VACUUM ANALYZE。
CREATE TABLE watchzones_geog
(
id SERIAL PRIMARY KEY,
date_created TIMESTAMP with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
longitude NUMERIC(10, 7) DEFAULT NULL,
radius INTEGER,
position geography(point)
);
CREATE INDEX watchzones_geog_gix ON watchzones_geog USING GIST (position);
insert into watchzones_geog
SELECT generate_series(1,500000) AS id, now(),-29.21073,152.93322,'50',ST_GeogFromText('POINT(152.93322 -29.21073)');
CREATE TABLE incident_geog (
incident_id SERIAL PRIMARY KEY,
incident_name VARCHAR(20),
incident_span GEOGRAPHY(POLYGON)
);
CREATE INDEX incident_geog_gix ON incident_geog USING GIST (incident_span);
Insert into incident_geog values (1,'test', ST_GeogFromText
('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))'));
explain analyze SELECT i.incident_id,
wz.id
FROM watchzones_geog wz,
incident_geog i
WHERE St_dwithin(position, incident_span, radius);
"Nested Loop (cost=0.27..348717.00 rows=17 width=8) (actual time=0.277..18551.844 rows=500000 loops=1)"
" -> Seq Scan on watchzones_geog wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.102..50.052 rows=500000 loops=1)"
" -> Index Scan using incident_geog_gix on incident_geog i (cost=0.27..0.67 rows=1 width=711) (actual time=0.036..0.036 rows=1 loops=500000)"
" Index Cond: (incident_span && _st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && _st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(wz."position", incident_span, (wz.radius)::double precision, true))"
"Planning time: 0.155 ms"
"Execution time: 18587.041 ms"
3。我也尝试过使用ST_Buffer(position, radius,'quad_segs=8')
然后使用ST_Intersects创建一个圆。借助此查询,几何和地理数据类型都需要花费一分钟以上的时间。
如果有人可以提出一种更好的方法或进行优化以加快执行速度,那将是很棒的事情。
谢谢
答案 0 :(得分:0)
查询很好,但是您的示例错误。首先,请注意,针对1个多边形优化的查询可能与针对数千个多边形的优化查询不同。
主要问题在于采样点。照原样,您在完全相同的位置上有500,000个点,因此根据相交的多边形,查询将返回0或500 000个结果。 Postgis首先使用索引使用方形框将点/多边形相交,然后通过计算真实距离来优化结果。使用您的样本,它必须计算出500,000次的距离,这很慢。
使用具有随机位置(1度以内)的点层,查询只需不到1秒的时间,因为它只需要计算20个位置的距离即可。
INSERT INTO watchzones_geog
SELECT generate_series(1,500000) AS id, now(),0,0,'50',
ST_makePoint(152.93322+random(),-29.21073+random())::geography;
explain analyze SELECT i.incident_id,
wz.id
FROM watchzones_geog wz,
incident_geog i
WHERE St_dwithin(position, incident_span, radius);
Nested Loop (cost=0.00..272424.01 rows=1 width=8) (actual time=25.956..921.846 rows=20 loops=1)
--------------------------------------------
Join Filter: ((wz."position" && _st_expand(i.incident_span, (wz.radius)::double precision)) AND (i.incident_span && _st_expand(wz."position", (wz.radius)::double precision)) AND _st_dwithin(wz."position", i.incident_span, (wz.radius)::double precision, true))
Rows Removed by Join Filter: 499980
-> Seq Scan on incident_geog i (cost=0.00..1.01 rows=1 width=36) (actual time=0.009..0.009 rows=1 loops=1)
-> Seq Scan on watchzones_geog wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.006..65.625 rows=500000 loops=1)
Planning time: 1.887 ms
Execution time: 921.895 ms