在安装了 PostGis 2.2.0 的 Postgres 9.5 数据库中,我有两个包含几何数据(点)的表,我想将一个表中的点分配给来自另一个表的点,但我不希望两次分配buildings.gid
。只要分配了一个buildings.gid
,就不应将其分配给另一个pvanlagen.buildid
。
buildings
:
CREATE TABLE public.buildings (
gid numeric NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass),
osm_id character varying(11),
name character varying(48),
type character varying(16),
geom geometry(MultiPolygon,4326),
centroid geometry(Point,4326),
gembez character varying(50),
gemname character varying(50),
krsbez character varying(50),
krsname character varying(50),
pv boolean,
gr numeric,
capac numeric,
instdate date,
pvid numeric,
dist numeric,
CONSTRAINT buildings_pkey PRIMARY KEY (gid)
);
CREATE INDEX build_centroid_gix
ON public.buildings
USING gist
(st_transform(centroid, 31467));
CREATE INDEX buildings_geom_idx
ON public.buildings
USING gist
(geom);
pvanlagen
:
CREATE TABLE public.pvanlagen (
gid integer NOT NULL DEFAULT nextval('pv_bis2010_bayern_wgs84_gid_seq'::regclass),
tso character varying(254),
tso_number numeric(10,0),
system_ope character varying(254),
system_key character varying(254),
location character varying(254),
postal_cod numeric(10,0),
street character varying(254),
capacity numeric,
voltage_le character varying(254),
energy_sou character varying(254),
beginning_ date,
end_operat character varying(254),
id numeric(10,0),
kkz numeric(10,0),
geom geometry(Point,4326),
gembez character varying(50),
gemname character varying(50),
krsbez character varying(50),
krsname character varying(50),
buildid numeric,
dist numeric,
trans boolean,
CONSTRAINT pv_bis2010_bayern_wgs84_pkey PRIMARY KEY (gid),
CONSTRAINT pvanlagen_buildid_fkey FOREIGN KEY (buildid)
REFERENCES public.buildings (gid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid)
);
CREATE INDEX pv_bis2010_bayern_wgs84_geom_idx
ON public.pvanlagen
USING gist
(geom);
我的想法是在boolean
表中添加pv
列buildings
,该列在分配buildings.gid
时设置:
UPDATE pvanlagen
SET buildid=buildings.gid, dist='50'
FROM buildings
WHERE buildid IS NULL
AND buildings.pv is NULL
AND pvanlagen.gemname=buildings.gemname
AND ST_Distance(ST_Transform(pvanlagen.geom,31467)
,ST_Transform(buildings.centroid,31467))<50;
UPDATE buildings
SET pv=true
FROM pvanlagen
WHERE buildings.gid=pvanlagen.buildid;
我在buildings
中测试了50行,但申请所有这些行需要很长时间。我有 3.200.000建筑物和 260.000 PV 。
应分配最近建筑物的gid
。如果是关系,则分配gid
无关紧要。如果我们需要构建规则,我们可以使用较低的gid
来构建。
50米意味着作为限制。我使用ST_Distance()
因为它返回的最小距离应该在50米之内。后来我多次提出它,直到每个PV Anlage都被分配。
建筑物和PV被分配到各自的区域(gemname
)。这应该使分配更便宜,因为我知道最近的建筑必须在同一区域内(gemname
)。
我在下面的反馈后尝试了这个查询:
UPDATE pvanlagen p1
SET buildid = buildings.gid
, dist = buildings.dist
FROM (
SELECT DISTINCT ON (b.gid)
p.id, b.gid, b.dist::numeric
FROM (
SELECT id, ST_Transform(geom, 31467)
FROM pvanlagen
WHERE buildid IS NULL -- not assigned yet
) p
, LATERAL (
SELECT b.gid, ST_Distance(ST_Transform(p1.geom, 31467), ST_Transform(b.centroid, 31467)) AS dist
FROM buildings b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid
WHERE p1.buildid IS NULL
AND b.gemname = p1.gemname
ORDER BY ST_Transform(p1.geom, 31467) <-> ST_Transform(b.centroid, 31467)
LIMIT 1
) b
ORDER BY b.gid, b.dist, p.id -- tie breaker
) x, buildings
WHERE p1.id = x.id;
但它以0 rows affected in 234 ms execution time
返回
我哪里错了?
答案 0 :(得分:5)
要强制执行您的规则,只需声明pvanlagen.buildid
UNIQUE
:
ALTER TABLE pvanlagen ADD CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid);
您的更新显示, building.gid
是PK。要同时强制执行参照完整性,请向buildings.gid
添加FOREIGN KEY
constraint。
你现在已经实现了这两个目标。但是,在添加这些约束之前,在下面运行大UPDATE
会更有效。
您的表定义中还有许多内容需要改进。例如,buildings.gid
以及pvanlagen.buildid
应该是integer
类型(或者如果你刻录很多的PK值,可能是bigint
)。 numeric
是一种昂贵的废话。
让我们关注核心问题:
案件并不像看起来那么简单。这是一个"nearest neighbour"问题,还有一个独特任务的额外复杂性。
此查询为每个PV找到最近的一个建筑物(PV Anlage的缩写 - pvanlagen
中的行),其中既没有分配,但是:
SELECT pv_gid, b_gid, dist
FROM (
SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
FROM pvanlagen
WHERE buildid IS NULL -- not assigned yet
) p
, LATERAL (
SELECT b.gid AS b_gid
, round(ST_Distance(p.geom31467
, ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below
FROM buildings b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ...
WHERE p1.buildid IS NULL -- ... yet
-- AND p.gemname = b.gemname -- not needed for performance, see below
ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
LIMIT 1
) b;
要快速进行此查询, 需要 buildings
上的空间功能GiST索引,使其 > 更快:
CREATE INDEX build_centroid_gix ON buildings USING gist (ST_Transform(centroid, 31467));
不确定为什么你没有
相关答案以及更多解释:
进一步阅读:
在索引到位后,我们不需要将匹配限制为相同的gemname
以提高性能。只有这是强制执行的实际规则才能执行此操作。如果必须始终观察,请在FK约束中包含该列:
我们可以在UPDATE
语句中使用上述查询。每个PV只使用一次,但不止一个PV可能仍然会发现 同一建筑物 最接近。每个建筑物只允许一个 PV。那你怎么解决这个问题呢?
换句话说,你会如何在这里分配对象?
一个简单的解决方案是:
UPDATE pvanlagen p1
SET buildid = sub.b_gid
, dist = sub.dist -- actual distance
FROM (
SELECT DISTINCT ON (b_gid)
pv_gid, b_gid, dist
FROM (
SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
FROM pvanlagen
WHERE buildid IS NULL -- not assigned yet
) p
, LATERAL (
SELECT b.gid AS b_gid
, round(ST_Distance(p.geom31467
, ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below
FROM buildings b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ...
WHERE p1.buildid IS NULL -- ... yet
-- AND p.gemname = b.gemname -- not needed for performance, see below
ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
LIMIT 1
) b
ORDER BY b_gid, dist, pv_gid -- tie breaker
) sub
WHERE p1.gid = sub.pv_gid;
我使用DISTINCT ON (b_gid)
缩小到每个建筑物的一个行,选择距离最短的PV。详细说明:
对于最接近一个PV的任何建筑物,仅分配最近的PV。 PK列gid
(别名pv_gid
)如果两个同样接近,则作为决胜局。在这种情况下,某些PV将从更新中删除并保持未分配。 重复 查询,直到分配了所有PV。
这仍然是一个简单的算法。看看上面的图表,这将建筑物4分配给PV 4,将建筑物5分配给PV 5,而4-5和5-4可能是整体上更好的解决方案......
dist
列的类型目前您使用numeric
。您的原始查询分配了一个常量integer
,在numeric
中没有任何分数。
在我的新查询ST_Distance()
中,以米为单位返回实际距离double precision
。如果我们只是指定我们在numeric
数据类型中得到15个左右的小数位数,那么该数字不是 精确开头。我严重怀疑你想浪费存储空间。
我宁愿保存计算中的原始double precision
。或者,更好但,根据需要进行舍入。如果米数足够精确,只需投射并保存integer
(自动舍入数字)。或者首先乘以100以保存cm:
(ST_Distance(...) * 100)::int