我正在使用AWS RDS上托管的带有PostGIS 2.3的Posrgres 9.6。我正在尝试优化一些地理半径查询,以查询来自不同表的数据。
我正在考虑两种方法:具有多个联接的单个查询或两个单独但更简单的查询。
从总体上讲,为了简化结构,我的架构是:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
addresses
表保存地理数据。 position
列是在插入或更新行时从lat-lng列计算得出的。
每个address
与一个location
相关联。
每个address
可能有多个shops
,每个shop
将有一个inventory
。
为简洁起见,我省略了它们,但是所有表在参考列上都有适当的外键约束和btree索引。
表中有数十万行。
有了这个查询,我的主要用例就可以通过单个查询来满足,该查询在距中心地理位置(addresses
)1000米以内的10.0, 10.0
中进行搜索,并返回所有表中的数据:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
此查询有效,并且EXPLAIN ANALYZE
表明它确实使用了GIST
索引。
但是,我也可以将此查询分为两部分,并在应用程序层中管理中间结果。例如,这也可以工作:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
l.address_id IN (1, 2, 3, 4, 5)
中的值来自第一个查询。
两个拆分查询的查询计划看起来比第一个查询查询的查询简单,但我想知道这本身是否意味着第二个解决方案更好。
我知道内部联接的优化非常好,最好是单次往返数据库。
那内存使用情况如何?还是表上的资源争用? (例如锁)
答案 0 :(得分:0)
我使用IN(...)
(重新)将第二个代码合并为一个查询:
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE l.address_id IN ( --- only search for the addresses
SELECT a.id
FROM addresses a
WHERE ST_DWithin(a.position, ST_SetSRID(ST_Point(10.0, 10.0), 4326), 1000 true)
);
或者类似地,使用EXISTS(...)
:
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE EXISTS ( SELECT * --- only search for the addresses
FROM addresses a
WHERE a.id = l.address_id
AND ST_DWithin( a.position, ST_SetSRID(ST_Point(10.0, 10.0), 4326), 1000, true)
);