Question

我维护了两个postgis表：“ track_points”和“ buffers”。 “ track_point”表包含大量（近十亿个）点，“ buffer”表包含约20个多边形。

我想做的是，检查所有包含缓冲区的点，然后将相应的缓冲区ID分配给点记录。在网上搜索后，我发现“空间关节”可能对这里有很大帮助。根据我在网上发现的内容，我整理了一个查询，如下所示（{schema}只是模式名称的占位符）：

WITH join_query AS (
  SELECT
    points.id AS point_id,
    buffers.profile_id AS profile_id
  FROM {schema}.buffers AS buffers
  JOIN {schema}.track_points AS points
  ON ST_Contains(buffers.geom, points.geom)
)

UPDATE {schema}.track_points
  SET profile_id = join_query.profile_id
  FROM join_query
  WHERE id = join_query.point_id

我运行了查询，但是track_points表中的profile_id值均未更改。所以我想我的查询一定有问题吗？！??

还有，有人建议如何更有效地实现我的目标（关于track_points表中的大量点）？

顺便说一句，我正在使用Python的psycopg2连接到数据库。

Answer 1

如果您的点表有数十亿条记录，甚至不要尝试对其进行更新-或者您可以等待几天/几周来结束此更新；）。对于这种大规模操作，完美的解决方案是CTAS（创建表作为选择）；我假设您的多边形不相交，如果是，则告诉我profile_id您想要哪个缓冲区（最大，最小...）；

create table track_points2 as
select your_columns_for_track_points(expect profile_id), b.profile_id 
  from track_points tp, buffers b
 where st_dwithin(tp.geom, b.geom,0);

下一步，删除现有表track_points并将其替换为新表；

drop table track_points;
alter table track_points2 alter rename to track_points;

并为新表创建所有需要的索引和约束。

如果您不能在数据库中删除表，更改表等，那么您当然必须进行更新，但可以等待很长时间。

 update track_points tp
    set profile_id=b.profile_id
   from buffers b
  where st_dwithin(tp.geom, b.geom,0);

正如我之前写的那样，如果您有相交的缓冲区/多边形，那么您将不得不更改更新以从profile_id的多种选择中获得所需的内容。

使用空间连接子查询更新Postgis表

1 个答案: