优化PostgreSQL中的ST_Intersects(PostGIS)

时间:2015-08-04 02:43:27

标签: django postgresql query-optimization postgis postgresql-9.3

以下查询需要将近15分钟才能显示结果。我想知道为什么?因为数据?或几何的顶点?当我用不同的表(小尺寸shapefile)尝试查询时,它运行得很快。

这是查询。 (感谢Patrick):

WITH hi AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'High'
                                 AND ST_Intersects(fh.geom, ps.geom)
), med AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Medium'
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
), low AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Low'
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
  EXCEPT SELECT * FROM med
)
SELECT brgy_locat AS barangay, municipali AS municipality, high, medium, low
FROM (SELECT brgy_locat, municipali, count(*) AS high
      FROM hi
      GROUP BY 1, 2) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS medium
      FROM med
      GROUP BY 1, 2) cnt_med USING (brgy_locat, municipali)
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS low
      FROM low
      GROUP BY 1, 2) cnt_low USING (brgy_locat, municipali);

PostgreSQL 9.3,PostGIS 2.1.5

Polystructures:包含9847行:

CREATE TABLE evidensapp_polystructures (
  id serial NOT NULL PRIMARY KEY,
  bldg_name character varying(100) NOT NULL,
  bldg_type character varying(50) NOT NULL,
  brgy_locat character varying(50) NOT NULL,
  municipali character varying(50) NOT NULL,
  province character varying(50) NOT NULL,
  geom geometry(MultiPolygon,32651)
);

CREATE INDEX evidensapp_polystructures_geom_id
  ON evidensapp_polystructures USING gist (geom);
ALTER TABLE evidensapp_polystructures CLUSTER ON evidensapp_polystructures_geom_id;

SeniangCBR:只有6行,shapefile大小(如果重要):52,060 KB

CREATE TABLE evidensapp_seniangcbr (
  id serial NOT NULL PRIMARY KEY,
  hazard character varying(16) NOT NULL,
  geom geometry(MultiPolygon,32651)
);

CREATE INDEX evidensapp_seniangcbr_geom_id ON evidensapp_seniangcbr USING gist (geom);
ALTER TABLE evidensapp_seniangcbr CLUSTER ON evidensapp_seniangcbr_geom_id;

在我使用LayerMapping时,使用Django(GeoDjango)实用程序自动将所有数据加载到数据库中。

EXPLAIN ANALYZE LINK HERE.

我现在没有服务器,我在电脑上运行查询。

  • 处理器:Intel(R)Core(TM)i7-4790 CPU @ 3.60GHz(8个CPU),~3.6GHz
  • 内存:8192MB RAM
  • 操作系统:Windows 7 64位

3 个答案:

答案 0 :(得分:2)

EXPLAIN ANALYZE输出难以阅读,因为所有字段和函数都被加密为radio alphabet。也就是说,有两点突出:

  1. 大部分时间花在ST_Intersects()函数上,这并不奇怪。
  2. EXCEPT条款似乎效率也很低。
  3. 所以请试试这个,而不是那么冗长的版本:

    SELECT brgy_locat AS barangay, municipali AS municipality,
           sum(CASE max_hz_id WHEN 3 THEN 1 ELSE 0 END) AS high,
           sum(CASE max_hz_id WHEN 2 THEN 1 ELSE 0 END) AS medium,
           sum(CASE max_hz_id WHEN 1 THEN 1 ELSE 0 END) AS low
    FROM (
      SELECT ps.id, ps.brgy_locat, ps.municipali,
             max(CASE fh.hazard WHEN 'Low' THEN 1 WHEN 'Medium' THEN 2 WHEN 'High' THEN 3 END) AS max_hz_id
      FROM evidensapp_polystructures ps
      JOIN evidensapp_seniangcbr fh ON ST_Intersects(fh.geom, ps.geom)
      GROUP BY 1, 2, 3
    ) AS ps_fh
    GROUP BY 1, 2;
    

    现在只有一次调用ST_Intersects(),这可能(希望)比危险地图子集上的三次调用快得多(由于PostGIS代码的内部效率)。

    很明显,危险类别字符串被转换为一系列整数,便于订购和比较。在内部查询中,根据您的要求选择最大危险值。在主查询中,每个结构的最大值被加到它们各自的列中。如果可能的话,更改表结构以使用这三个整数代码并链接到类标签的帮助器表:您的表会变得更小,因此更快,内部查询中的CASE语句可能会被删除。或者,添加一个包含整数代码的列,并根据" hazard"更新值。列。

    请注意,这些CASE语句效率不高(我在上一个答案中使用EXCEPT子句的原因)。在PG 9.4中,引入了关于聚合函数的新FILTER子句,这将使查询更快更容易阅读:

    count(id) FILTER (WHERE max_hz_id = 3) AS high
    

    您可能需要考虑升级。

    Selamat mula Maynila

答案 1 :(得分:1)

在表格中添加bounding_box geometry(Polygon,4326)列。该列的值将是一个完全封装multipolygon的边界框({​​x 1的} x,y和min x,y)。

然后您的查询将如下所示:

multipolygon

这样做的好处是第一个AND ST_Intersects(fh.bounding_box, ps.bounding_box) AND ST_Intersects(fh.geom, ps.geom) 电话非常快。如果它返回false,则永远不会调用第二个更复杂的ST_Intersects调用,在这种情况下可以节省一些时间。

答案 2 :(得分:1)

suggested and explained under your related question类似,我会在外部UNION ALL使用FULL JOIN代替SELECT

WITH hi AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   WHERE  fh.hazard = 'High'
   GROUP  BY 1, 2, 3
   )
, med AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi USING (brgy_locat, municipali)
   WHERE  fh.hazard = 'Medium'
   AND    hi.brgy_locat IS NULL
   GROUP  BY 1, 2, 3
   )
TABLE hi

UNION ALL
TABLE med

UNION ALL
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi  USING (brgy_locat, municipali)
   LEFT   JOIN med USING (brgy_locat, municipali)
   WHERE  fh.hazard = 'Low'
   AND    hi.brgy_locat IS NULL
   AND    med.brgy_locat IS NULL
   GROUP BY 1, 2, 3;

这仅考虑具有相同(brgy_locat, municipali)的每组行的最高危险等级。只有与evidensapp_seniangcbr中相关危险等级的任何行实际相交的行才会出现在结果中。此外,计数仅计算实际相交的行。 (brgy_locat, municipali)中可能有更多行具有相同的evidensapp_polystructures,只是不与相同的危险等级相交,因此会被忽略。

选择一种标准方法,以排除已在较低级别的较高危险级别找到匹配项的行。

LEFT JOIN / IS NULL应使用id上的索引并在此处表现非常好。当然比基于整行的EXCEPT更快,而不能使用索引。

索引

需要在你的表中添加一个bounding_box几何列,就像建议的另一个答案一样。 PostGIS在现代版本中使用(索引支持的)边界框比较自动The PostGIS documentation:

  

此函数调用将自动包含一个边界框   比较将使用几何上可用的任何索引。

事实上,我们已经在explain output you posted.

中看到了索引扫描

您现有的GiST索引evidensapp_polystructures_geom_id应该可以快速查询 旁边:索引的名称应该是evidensapp_polystructures_geom_idx

此外,如果您还没有,请在(brgy_locat, municipali)上创建一个索引:

CREATE INDEX foo_idx ON evidensapp_polystructures (brgy_locat, municipali);

替代LATERAL加入

由于evidensapp_seniangcbr只有6行,LATERAL加入可能更快:

WITH hi AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      WHERE  ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'High'
   GROUP  BY 1, 2, 3
   )
, med AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      LEFT   JOIN hi USING (brgy_locat, municipali)
      WHERE  hi.brgy_locat IS NULL
      AND    ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'Medium'
   GROUP  BY 1, 2, 3
   )
TABLE hi

UNION ALL
TABLE med

UNION ALL
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.id, ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      LEFT   JOIN hi  USING (brgy_locat, municipali)
      LEFT   JOIN med USING (brgy_locat, municipali)
      WHERE  hi.brgy_locat IS NULL
      AND    med.brgy_locat IS NULL
      AND    ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'Low'
   GROUP  BY 1, 2, 3;

关于LATERAL加入: