在postgresql中计算最适合的地理位置

时间:2016-02-09 15:55:10

标签: postgresql postgis

我有一个查询,可以在更大的区域内找到最合适的区域。最佳拟合的定义是当小区域的质心在大区域内时,如下:

select p.id as parent_area_id, c.id as child_area_id
from fog.area p
inner join fog.area c on ST_CONTAINS(p.shape, ST_CENTROID(c.shape))
inner join fog.area_type_group_flattened gc on gc.child_type_id = c.type_id
inner join fog.area_type_group_flattened gp on gp.child_type_id = p.type_id
inner join fog.area_type_hierarchy h on h.parent_type_id = gp.parent_type_id and h.child_type_id = gc.parent_type_id
where p.id = :parent and c.type_id != :excludingchildtype

在28s内返回180行。

我在形状列上有一个gist索引,它是几何类型。如果我将ST_CONTAINS替换为_ST_CONTAINS,则需要相同的时间,建议它不使用索引,或者它的使用不会产生太大影响。具有几何列的表包含244,325行,因此不是很大。

如果我将ST_CONTAINS替换为c.shape && p.shape,则会在280毫秒内返回395行。所以性能要好得多,但它会返回太多行。

我以为我可以使用&&加入,然后在where子句中使用ST_CONTAINS,这样昂贵的部分就会在一个小的结果集上运行,但需要花费39秒。

有没有一种方法可以在不改变结果的情况下提高此查询的性能?或者类似的技术大致相同但更快?

Postgres版本是9.4.4

解释分析

Nested Loop  (cost=287.00..21510.84 rows=1 width=8) (actual time=13067.916..75160.246 rows=135 loops=1)
  Join Filter: ((t1.child_type_id = p.type_id) AND (p.shape && st_centroid(c.shape)) AND _st_contains(p.shape, st_centroid(c.shape)))
  Rows Removed by Join Filter: 2933423
  ->  Index Scan using area_pkey on area p  (cost=0.42..8.44 rows=1 width=8633) (actual time=0.123..0.125 rows=1 loops=1)
        Index Cond: (id = 246420)
  ->  Nested Loop  (cost=286.58..21376.31 rows=467 width=8633) (actual time=2.339..46664.858 rows=2933558 loops=1)
        ->  Nested Loop  (cost=286.16..288.61 rows=1 width=8) (actual time=2.172..63.684 rows=198 loops=1)
              Join Filter: (h.parent_type_id = t1.parent_type_id)
              Rows Removed by Join Filter: 7862
              ->  HashAggregate  (cost=143.06..143.07 rows=1 width=8) (actual time=0.963..1.073 rows=62 loops=1)
                    Group Key: t1.parent_type_id, t1.child_type_id
                    CTE types
                      ->  Recursive Union  (cost=0.00..135.70 rows=199 width=8) (actual time=0.043..0.448 rows=66 loops=1)
                            ->  Seq Scan on area_type area_type_1  (cost=0.00..2.49 rows=49 width=4) (actual time=0.040..0.087 rows=50 loops=1)
                            ->  Hash Join  (cost=1.14..12.92 rows=15 width=8) (actual time=0.075..0.112 rows=8 loops=2)
                                  Hash Cond: (t_1.child_type_id = g_1.parent_type_id)
                                  ->  WorkTable Scan on types t_1  (cost=0.00..9.80 rows=490 width=4) (actual time=0.002..0.032 rows=33 loops=2)
                                  ->  Hash  (cost=1.06..1.06 rows=6 width=8) (actual time=0.057..0.057 rows=16 loops=1)
                                        Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                        ->  Seq Scan on area_type_group_member g_1  (cost=0.00..1.06 rows=6 width=8) (actual time=0.019..0.035 rows=16 loops=1)
                    ->  Hash Join  (cost=2.62..7.36 rows=1 width=8) (actual time=0.223..0.884 rows=62 loops=1)
                          Hash Cond: (t1.child_type_id = t2.id)
                          ->  CTE Scan on types t1  (cost=0.00..3.98 rows=199 width=8) (actual time=0.046..0.579 rows=66 loops=1)
                          ->  Hash  (cost=2.61..2.61 rows=1 width=4) (actual time=0.146..0.146 rows=46 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 2kB
                                ->  Seq Scan on area_type t2  (cost=0.00..2.61 rows=1 width=4) (actual time=0.016..0.094 rows=46 loops=1)
                                      Filter: (COALESCE(is_group, B'0'::bit(1)) = B'0'::bit(1))
                                      Rows Removed by Filter: 4
              ->  Hash Join  (cost=143.09..145.50 rows=2 width=8) (actual time=0.026..0.512 rows=130 loops=62)
                    Hash Cond: (h.child_type_id = gc.parent_type_id)
                    ->  Seq Scan on area_type_hierarchy h  (cost=0.00..2.01 rows=101 width=8) (actual time=0.005..0.103 rows=103 loops=62)
                    ->  Hash  (cost=143.08..143.08 rows=1 width=8) (actual time=1.116..1.116 rows=62 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 3kB
                          ->  Subquery Scan on gc  (cost=143.06..143.08 rows=1 width=8) (actual time=0.874..1.047 rows=62 loops=1)
                                ->  HashAggregate  (cost=143.06..143.07 rows=1 width=8) (actual time=0.872..0.931 rows=62 loops=1)
                                      Group Key: t1_1.parent_type_id, t1_1.child_type_id
                                      CTE types
                                        ->  Recursive Union  (cost=0.00..135.70 rows=199 width=8) (actual time=0.005..0.388 rows=66 loops=1)
                                              ->  Seq Scan on area_type  (cost=0.00..2.49 rows=49 width=4) (actual time=0.003..0.052 rows=50 loops=1)
                                              ->  Hash Join  (cost=1.14..12.92 rows=15 width=8) (actual time=0.065..0.103 rows=8 loops=2)
                                                    Hash Cond: (t.child_type_id = g.parent_type_id)
                                                    ->  WorkTable Scan on types t  (cost=0.00..9.80 rows=490 width=4) (actual time=0.001..0.030 rows=33 loops=2)
                                                    ->  Hash  (cost=1.06..1.06 rows=6 width=8) (actual time=0.039..0.039 rows=16 loops=1)
                                                          Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                          ->  Seq Scan on area_type_group_member g  (cost=0.00..1.06 rows=6 width=8) (actual time=0.002..0.018 rows=16 loops=1)
                                      ->  Hash Join  (cost=2.62..7.36 rows=1 width=8) (actual time=0.153..0.797 rows=62 loops=1)
                                            Hash Cond: (t1_1.child_type_id = t2_1.id)
                                            ->  CTE Scan on types t1_1  (cost=0.00..3.98 rows=199 width=8) (actual time=0.007..0.523 rows=66 loops=1)
                                            ->  Hash  (cost=2.61..2.61 rows=1 width=4) (actual time=0.114..0.114 rows=46 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 2kB
                                                  ->  Seq Scan on area_type t2_1  (cost=0.00..2.61 rows=1 width=4) (actual time=0.009..0.063 rows=46 loops=1)
                                                        Filter: (COALESCE(is_group, B'0'::bit(1)) = B'0'::bit(1))
                                                        Rows Removed by Filter: 4
        ->  Index Scan using fki_area_area_type_fkey on area c  (cost=0.42..21004.21 rows=8349 width=8633) (actual time=6.507..173.256 rows=14816 loops=198)
              Index Cond: (type_id = gc.child_type_id)
              Filter: (type_id <> 7)
              Rows Removed by Filter: 2282
Planning time: 9.831 ms
Execution time: 75160.936 ms

1 个答案:

答案 0 :(得分:0)

通过解决方法回答我自己的问题,但不接受,因为它感觉像是黑客。希望得到更好的答案。

通过在末尾使用带有offset 0的子查询,将首先评估子查询。然后ST_CONTAINS,这是一个昂贵的部分,运行在一个小得多的数据集上。这是基于http://blog.2ndquadrant.com/hinting_at_postgresql/的建议。

select parent_area_id, child_area_id from
    (select p.id as parent_area_id, c.id as child_area_id, p.shape, st_centroid(c.shape) as centroid
    from fog.area p
    inner join fog.area c on p.shape && c.shape
    inner join fog.area_type_group_flattened gc on gc.child_type_id = c.type_id
    inner join fog.area_type_group_flattened gp on gp.child_type_id = p.type_id
    inner join fog.area_type_hierarchy h on h.parent_type_id = gp.parent_type_id and h.child_type_id = gc.parent_type_id
    where p.id = :parent and c.type_id != :excludingchildtype offset 0) sq
where ST_CONTAINS(shape, centroid)

这在&lt; 1秒