我正在Postgres 9.3.9中的大型表上进行查询。它是一个空间数据集,它是空间索引的。比如说,我需要找到3种类型的物体:A,B和C.标准是B和C都在A的一定距离内,比如500米。
我的查询是这样的:
select
school.osm_id as school_osm_id,
school.name as school_name,
school.way as school_way,
restaurant.osm_id as restaurant_osm_id,
restaurant.name as restaurant_name,
restaurant.way as restaurant_way,
bar.osm_id as bar_osm_id,
bar.name as bar_name,
bar.way as bar_way
from (
select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'school') as school,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'restaurant') as restaurant,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'bar') as bar
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false)
and ST_DWithin(school.way_geo, bar.way_geo, 500, false);
这个查询给了我想要的东西,但需要很长时间,比如13秒才能执行。我想知道是否有另一种方法来编写查询并使其更有效。
查询计划:
Nested Loop (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
Buffers: shared hit=530967 read=8733
-> Nested Loop (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
Buffers: shared hit=389863 read=8707
-> Bitmap Heap Scan on planet_osm_point (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
Recheck Cond: (amenity = 'school'::text)
Buffers: shared hit=859 read=5204
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
Index Cond: (amenity = 'school'::text)
Buffers: shared hit=3 read=24
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_1 (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 0
Buffers: shared hit=389004 read=3503
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
Buffers: shared hit=384528 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=146631 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
Index Cond: (amenity = 'restaurant'::text)
Buffers: shared hit=237897
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_2 (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 1
Buffers: shared hit=141104 read=26
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
Buffers: shared hit=127019
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=101609
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
Index Cond: (amenity = 'bar'::text)
Buffers: shared hit=25410
Total runtime: 11238.605 ms
我目前只使用一张表 1,372,711行。它有 73列:
Column | Type | Modifiers
--------------------+----------------------+---------------------------
osm_id | bigint |
access | text |
addr:housename | text |
addr:housenumber | text |
addr:interpolation | text |
admin_level | text |
aerialway | text |
aeroway | text |
amenity | text |
area | text |
barrier | text |
bicycle | text |
brand | text |
bridge | text |
boundary | text |
building | text |
capital | text |
construction | text |
covered | text |
culvert | text |
cutting | text |
denomination | text |
disused | text |
ele | text |
embankment | text |
foot | text |
generator:source | text |
harbour | text |
highway | text |
historic | text |
horse | text |
intermittent | text |
junction | text |
landuse | text |
layer | text |
leisure | text |
lock | text |
man_made | text |
military | text |
motorcar | text |
name | text |
natural | text |
office | text |
oneway | text |
operator | text |
place | text |
poi | text |
population | text |
power | text |
power_source | text |
public_transport | text |
railway | text |
ref | text |
religion | text |
route | text |
service | text |
shop | text |
sport | text |
surface | text |
toll | text |
tourism | text |
tower:type | text |
tunnel | text |
water | text |
waterway | text |
wetland | text |
width | text |
wood | text |
z_order | integer |
tags | hstore |
way | geometry(Point,4326) |
way_geo | geography |
gid | integer | not null default nextval('...
Indexes:
"planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
"idx_planet_osm_point_amenity" btree (amenity)
"idx_planet_osm_point_waygeo" gist (way_geo)
"planet_osm_point_index" gist (way)
"planet_osm_point_pkey" btree (osm_id)
在舒适学校,餐厅和酒吧分别有8811,6291,779排。
答案 0 :(得分:4)
此查询应该有很长的路要走(很多更快):
WITH school AS (
SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
FROM planet_osm_point s
, LATERAL (
SELECT 1 FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'bar'
LIMIT 1 -- bar exists -- most selective first if possible
) b
, LATERAL (
SELECT 1 FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'restaurant'
LIMIT 1 -- restaurant exists
) r
WHERE s.amenity = 'school'
)
SELECT * FROM (
TABLE school -- schools
UNION ALL -- bars
SELECT s.school_id, 'bar', x.*
FROM school s
, LATERAL (
SELECT osm_id, name, way_geo
FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'bar'
) x
UNION ALL -- restaurants
SELECT s.school_id, 'rest.', x.*
FROM school s
, LATERAL (
SELECT osm_id, name, way_geo
FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'restaurant'
) x
) sub
ORDER BY school_id, (type <> 'school'), type, osm_id;
不 与原始查询相同,而是您真正想要的内容,as per discussion in comments:
我想要一份有500家餐馆和酒吧的学校名单 米,我需要每个学校及其相应的坐标 餐厅和酒吧。
因此,此查询会返回这些学校的列表,然后是附近的酒吧和餐馆。每组行由osm_id
列中学校的school_id
保存在一起。
现在使用LATERAL
连接,以使用空间GiST索引。
TABLE school
只是SELECT * FROM school
的缩写:
表达式(type <> 'school')
首先命令每组中的学校,因为:
最终sub
中的子查询SELECT
只需按此表达式排序。 UNION
查询将附加的ORDER BY
列表限制为仅列,而不是表达式。
我专注于您为此答案而提出的查询 - 忽略扩展要求,以过滤任何其他70个文本列。这确实是一个设计缺陷。搜索条件应集中在几个列中。或者你必须索引所有70列,而像我要提出的多列索引几乎不是一个选择。仍然可能虽然......
除了现有的:
"idx_planet_osm_point_waygeo" gist (way_geo)
如果始终在同一列上进行过滤,您可以创建一个 multicolumn index ,其中包含您感兴趣的几个列,因此 index-only scans 成为可能的:
CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)
即将发布的Postgres 9.5 会引入恰好解决您案件的重大改进:
允许查询使用GiST索引对边界框索引对象(多边形,圆形)执行精确的距离过滤 (Alexander Korotkov,Heikki Linnakangas)
以前,需要使用公用表表达式来返回大表 按边界框距离排序的行数,然后进行过滤 进一步使用更精确的非边界框距离计算。
允许GiST索引执行仅索引扫描(Anastasia Lubennikova,Heikki Linnakangas,Andreas Karlsson)
对您特别感兴趣。现在您可以拥有单多列(覆盖)GiST索引:
CREATE INDEX reservations_range_idx ON reservations
USING gist(amenity, way_geo, name, osm_id)
和
- 提高位图索引扫描性能(Teodor Sigaev,Tom Lane)
并且:
- 添加GROUP BY分析函数
GROUPING SETS
,CUBE
和ROLLUP
(Andrew Gierth,Atri Sharma)
为什么呢?因为ROLLUP
会简化我建议的查询。相关回答:
第一个alpha版本已于2015年7月2日发布。The expected timeline for the release:
这是9.5版的alpha版本,表明有些变化 在发布之前仍然可以使用功能。 PostgreSQL项目 将于8月发布9.5 beta 1,然后定期发布 测试所需的额外测试版,直到最终发布 2015年末。
当然,千万不要忽视基础知识:
答案 1 :(得分:1)
您使用的3个子选项效率非常低。将它们写为LEFT JOIN
子句,查询应该更有效:
SELECT
school.osm_id AS school_osm_id,
school.name AS school_name,
school.way AS school_way,
restaurant.osm_id AS restaurant_osm_id,
restaurant.name AS restaurant_name,
restaurant.way AS restaurant_way,
bar.osm_id AS bar_osm_id,
bar.name AS bar_name,
bar.way AS bar_way
FROM planet_osm_point school
LEFT JOIN planet_osm_point restaurant ON restaurant.amenity = 'restaurant' AND
ST_DWithin(school.way_geo, restaurant.way_geo, 500, false)
LEFT JOIN planet_osm_point bar ON bar.amenity = 'bar' AND
ST_DWithin(school.way_geo, bar.way_geo, 500, false)
WHERE school.amenity = 'school'
AND (restaurant.osm_id IS NOT NULL OR bar.osm_id IS NOT NULL);
但如果每所学校有多家餐厅和酒吧,这会产生太多结果。您可以像这样简化查询:
SELECT
school.osm_id AS school_osm_id,
school.name AS school_name,
school.way AS school_way,
a.osm_id AS amenity_osm_id,
a.amenity AS amenity_type,
a.name AS amenity_name,
a.way AS amenity_way,
FROM planet_osm_point school
JOIN planet_osm_point a ON ST_DWithin(school.way_geo, a.way_geo, 500, false)
WHERE school.amenity = 'school'
AND a.amenity IN ('bar', 'restaurant');
这将为每所学校的每个酒吧和餐厅提供服务。没有餐厅或酒吧500米范围内的学校没有列出。
答案 2 :(得分:0)
如果使用显式连接会有什么不同吗?
SELECT a.id as a_id, a.name as a_name, a.geog as a_geog,
b.id as b_id, b.name as b_name, b.geog as b_geog,
c.id as c_id, c.name as c_name, c.geog as c_geog
FROM table1 a
JOIN table1 b ON b.type = 'B' AND ST_DWithin(a.geog, b.geog, 100)
JOIN table1 c ON c.type = 'C' AND ST_DWithin(a.geog, c.geog, 100)
WHERE a.type = 'A';
答案 3 :(得分:0)
尝试使用内部联接语法并比较结果,应该没有重复。我的猜测是它应该花费1/3或者比原始查询更好:
select a.id as a_id, a.name as a_name, a.geog as a_geo,
b.id as b_id, b.name as b_name, b.geog as b_geo,
c.id as c_id, c.name as c_name, c.geog as c_geo
from table1 as a
INNER JOIN table1 as b on b.type='B'
INNER JOIN table1 as c on c.type='C'
WHERE a.type='A' and
(ST_DWithin(a.geo, b.geo, 100) and ST_DWithin(a.geo, c.geo, 100))