临时表中的PostgreSQL索引

时间:2017-05-16 17:48:34

标签: postgresql indexing postgis

我有以下PostGIS / greSQL查询

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(sn.geom, luc.geom)

查询计划:

Nested Loop  (cost=2.45..53.34 rows=8 width=236) (actual time=1.030..26.751 rows=1282 loops=1)
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.023..0.029 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Bitmap Heap Scan on lucas12 luc  (cost=2.45..51.06 rows=8 width=236) (actual time=1.002..26.031 rows=1282 loops=1)
        Recheck Cond: (germany_bld.geom ~ geom)
        Filter: _st_contains(germany_bld.geom, geom)
        Rows Removed by Filter: 499
        Heap Blocks: exact=174
        ->  Bitmap Index Scan on lucas12_geom_idx  (cost=0.00..2.45 rows=23 width=0) (actual time=0.419..0.419 rows=1781 loops=1)
              Index Cond: (germany_bld.geom ~ geom)
Planning time: 0.536 ms
Execution time: 27.023 ms

这是由于几何列上的索引非常快。但是当我想为sn多边形添加一个缓冲区时(一个代表边界线的大多边形,因此是一个非常简单的特征):

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT ST_Buffer(geom, 30000) geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(sn.geom, luc.geom)

查询计划:

Nested Loop  (cost=0.00..13234.80 rows=7818 width=236) (actual time=6221.391..1338380.257 rows=2298 loops=1)
  Join Filter: st_contains(st_buffer(germany_bld.geom, 30000::double precision), luc.geom)
  Rows Removed by Join Filter: 22637
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.018..0.036 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Seq Scan on lucas12 luc  (cost=0.00..1270.55 rows=23455 width=236) (actual time=0.005..25.623 rows=24935 loops=1)
Planning time: 0.271 ms
Execution time: 1338381.079 ms

查询需要永远!我把它归咎于时间表sn中不存在的索引。速度的大幅下降不会由ST_Buffer()引起,因为它本身非常快,缓冲功能很简单。

两个问题:

1)我是对的吗?

2)我能做什么,达到与第一个查询相同的速度?

2 个答案:

答案 0 :(得分:1)

我遇到了一个陷阱。 ST_Buffer()不是正确的选择而是ST_DWithin(),它在实际执行边界框比较时保留每个几何列的索引。 ST_Buffer()的https://dmp.fabric8.io清楚地表明不使用ST_Buffer()犯错,而是使用ST_DWithin()进行半径搜索。由于缓冲区这个词在很多GIS软件中使用,我没有考虑寻找替代方案。

SELECT luc.*
FROM spatial_derived.lucas12 luc
JOIN spatial_derived.germany_bld sn ON ST_DWithin(sn.geom, luc.geom, 30000)
WHERE bld.state = 'SN'

工作,只需要一秒钟(在#34;缓冲区"中的2300点)!

答案 1 :(得分:0)

要检查您是否正确,您可以按原样保留sn并在加入时应用ST_Buffer

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(ST_Buffer(sn.geom, 30000), luc.geom)

查询计划:

Nested Loop  (cost=0.00..13234.80 rows=7818 width=236) (actual time=6237.876..1340000.576 rows=2298 loops=1)
  Join Filter: st_contains(st_buffer(germany_bld.geom, 30000::double precision), luc.geom)
  Rows Removed by Join Filter: 22637
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.023..0.038 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Seq Scan on lucas12 luc  (cost=0.00..1270.55 rows=23455 width=236) (actual time=0.004..24.525 rows=24935 loops=1)
Planning time: 0.453 ms
Execution time: 1340001.420 ms

此查询将回答您的问题或首先回答结果。

<强>更新

  1. 你的假设似乎是错误的。 ST_Buffer()导致速度下降
  2. 使用ST_Buffer时,您似乎加入了更大的集合,因此非常期待时间的增加。您可以在有explain analyze次查询的情况下运行ST_Buffer() - 它可能会显示具有不同rows个数字和cost秒值的相同计划...