Postgres和PostGIS的“不利”执行计划

时间:2014-07-01 08:05:58

标签: sql postgresql postgis

我有以下数据库架构:

CREATE TABLE public.sgclasstab_id67
( 
    oid bigint NOT NULL,
    att_1113 bigint,
    att_1114 bigint,
    att_1115 character varying(500),
    att_1116 character varying(2000),
    att_1578 double precision,
    CONSTRAINT sgclasstab_id67_pkey PRIMARY KEY (oid)
)

CREATE TABLE public.sgclasstab_id68
(
    oid bigint NOT NULL,
    att_1119 bigint,
    att_1139 bigint,
    att_1496 character varying(2000),
    CONSTRAINT sgclasstab_id68_pkey PRIMARY KEY (oid)
)

CREATE TABLE public.sggeofacelist
(
    oid bigint NOT NULL,
    meanid smallint,
    numofislands smallint DEFAULT 0,
    compound smallint DEFAULT 0,
    extra character varying(512),
    crs bigint DEFAULT (-1),
    crsapp bigint DEFAULT (-1),
    version bigint DEFAULT 0,
    feature geometry,
    CONSTRAINT sggeofacelist_pkey PRIMARY KEY (oid),
    CONSTRAINT enforce_dims_feature CHECK (st_ndims(feature) = 3),
    CONSTRAINT enforce_srid_feature CHECK (st_srid(feature) = 0)
)

CREATE TABLE public.sggeopointlist
(
    oid bigint NOT NULL,
    angle double precision,
    meanid smallint,
    crs bigint DEFAULT (-1),
    crsapp bigint DEFAULT (-1),
    origx double precision,
    origy double precision,
    feature geometry,
    CONSTRAINT sggeopointlist_pkey PRIMARY KEY (oid),
    CONSTRAINT enforce_dims_feature CHECK (st_ndims(feature) = 3),
    CONSTRAINT enforce_srid_feature CHECK (st_srid(feature) = 0)
)

列sgclasstab_id67.att_1114引用表sggeofacelist中的几何图形,该表仅包含表格sggeopointlist中的多边形sgclasstab_id68.att_1139引用几何图形,其中仅包含点几何图形。两个表都可以包含数十万个几何,其中只有一小部分与上表相关。所有几何都使用GIST索引。

现在,当我运行以下查询时

UPDATE sgclasstab_id68 SET att_1496 = (
    SELECT t3943814704643.att_1115 
    FROM sgclasstab_id67 t3943814704643, sggeofacelist t3943863539361, sgclasstab_id68 t3943875447103, sggeopointlist t3943875522916 
    WHERE ((t3943814704643.att_1114=t3943863539361.oid )) 
        AND ((t3943875447103.att_1139=t3943875522916.oid )) 
        AND ((t3943863539361.feature && (t3943875522916.feature) AND ST_Intersects(t3943863539361.feature,(t3943875522916.feature)))) 
        AND (t3943863539361.oid=t3943814704643.att_1114)  
        AND sgclasstab_id68.oid = t3943875447103.oid 
    LIMIT 1
)

它确实永远运行(我在4天后取消了它)。看到执行计划,这并不奇怪:

Update on sgclasstab_id68  (cost=0.00..1076.63 rows=100 width=736)
  ->  Seq Scan on sgclasstab_id68  (cost=0.00..1076.63 rows=100 width=736)
        SubPlan 1
          ->  Limit  (cost=0.70..10.48 rows=1 width=516)
                ->  Nested Loop  (cost=0.70..10.48 rows=1 width=516)
                      ->  Nested Loop  (cost=0.55..10.18 rows=1 width=524)
                            ->  Nested Loop  (cost=0.29..9.33 rows=1 width=5482)
                                  ->  Seq Scan on sgclasstab_id67 t3943814704643  (cost=0.00..1.01 rows=1 width=524)
                                  ->  Index Scan using sggeofacelist_pkey on sggeofacelist t3943863539361  (cost=0.29..8.31 rows=1 width=4974)
                                        Index Cond: (oid = t3943814704643.att_1114)
                            ->  Index Scan using sggeopointlist_idx on sggeopointlist t3943875522916  (cost=0.27..0.84 rows=1 width=48)
                                  Index Cond: ((t3943863539361.feature && feature) AND (t3943863539361.feature && feature))
                                  Filter: _st_intersects(t3943863539361.feature, feature)
                      ->  Index Scan using sgclasstab_id68a1139_idx on sgclasstab_id68 t3943875447103  (cost=0.14..0.29 rows=1 width=8)
                            Index Cond: (att_1139 = t3943875522916.oid)
                            Filter: (sgclasstab_id68.oid = oid)

如果我没有误读这里的任何内容,Postgres首先执行交集,然后排除sgclasstab_id68中未被对象引用的所有不相关的几何。

交换这两个操作或者我是否在此查询中执行某些操作以使此选项不可用,这不是更高效吗?如果不是,有没有办法迫使Postgres重新考虑?

PostgreSQL 9.3,PostGIS 2.1.1 r12113。

提前致谢(对于难以阅读的查询感到抱歉,它已自动生成)。

3 个答案:

答案 0 :(得分:1)

[不是答案] 仅供参考:清理过的查询(希望我没有发生任何错误):

UPDATE sgclasstab_id68 dst
SET att_1496 = (
    SELECT cla.att_1115
    FROM sgclasstab_id67 cla 
    JOIN sggeofacelist fac ON cla.att_1114 = fac.oid AND fac.oid = cla.att_1114
    JOIN sggeopointlist pnt ON fac.feature && (pnt.feature) AND ST_Intersects(fac.feature, pnt.feature)
    JOIN sgclasstab_id68 cla2 ON cla2.att_1139 = pnt.oid
    WHERE 1=1
        AND dst.oid = cla2.oid
        -- AND cla2.oid = cla2.oid
    LIMIT 1 
    )
    ;

[ANSWER] 看来,unali的cla2(classtab_id68)引用引用内部查询,而不是外部UPDATE语句中的目标表。因此,所有classtab_id68都将使用相同的值进行更新。此外,联接列缺少FK约束/索引。


更新:第二个想法,表格参考JOIN sgclasstab_id68 cla2不是必需的;它引用与目标行相同的行,因此查询可以进一步简化为:

UPDATE sgclasstab_id68 dst
SET att_1496 = (
    SELECT c67.att_1115 
    FROM sgclasstab_id67 c67 
    JOIN sggeofacelist fl ON c67.att_1114 = fl.oid AND fl.oid = c67.att_1114
    JOIN sggeopointlist pnt ON (fl.feature && pnt.feature) AND ST_Intersects(fl.feature, pnt.feature)
    WHERE dst.att_1139 = pnt.oid 
    LIMIT 1
    )
        ;

[但仍然需要适当的FK /索引。]


附录:子查询中的LIMIT 1(没有订单)也是可疑的。您希望您的更新至少是种类确定性的。此子查询只从结果集中选择一个随机行(如果有多个)并将其分配给目标表。似乎不合逻辑;至少不是我。


最后你真的不需要标量(?)子查询,但可以使用正常的UPDATE语法(我还删除了ST_intersects()连接已经隐含的边界框连接:

UPDATE sgclasstab_id68 dst
SET att_1496 = c67.att_1115 
FROM sgclasstab_id67 c67    
JOIN sggeofacelist fl ON c67.att_1114 = fl.oid AND fl.oid = c67.att_1114
JOIN sggeopointlist pnt ON ST_Intersects(fl.feature, pnt.feature)
WHERE dst.att_1139 = pnt.oid    
    ;

答案 1 :(得分:0)

NOT an answer, just an attempt to clarify the question with formatting (taken from joop)。这仍然是你想要的,如果是这样,现在解释的输出是什么?

UPDATE sgclasstab_id68 dst
    SET att_1496 = (
       SELECT cla.att_1115
    FROM sgclasstab_id67 cla 
       JOIN sggeofacelist fac ON cla.att_1114 = fac.oid AND fac.oid = cla.att_1114        
       JOIN sgclasstab_id68 cla2 ON cla2.att_1139 = pnt.oid
    WHERE  dst.oid = cla2.oid
       AND ST_Intersects(fac.feature, pnt.feature)       
  );

答案 2 :(得分:0)

考虑到joop的答案,我想出了我的第一个陈述的以下清理和修改版本

UPDATE sgclasstab_id68 SET att_1496 = (
    SELECT sgclasstab_id67.att_1115 
    FROM sgclasstab_id67, sggeofacelist, sggeopointlist 
    WHERE ((sgclasstab_id67.att_1114=sggeofacelist.oid )) 
        AND ((sgclasstab_id68.att_1139=sggeopointlist.oid )) 
        AND (ST_Intersects(sggeofacelist.feature,(sggeopointlist.feature))) 
        AND (sggeofacelist.oid=sgclasstab_id67.att_1114)  
)

这会产生以下执行计划

Update on sgclasstab_id68  (cost=0.00..1074.13 rows=100 width=736)
  ->  Seq Scan on sgclasstab_id68  (cost=0.00..1074.13 rows=100 width=736)
        SubPlan 1
          ->  Nested Loop  (cost=0.55..10.18 rows=1 width=516)
                ->  Nested Loop  (cost=0.29..9.33 rows=1 width=5482)
                      ->  Seq Scan on sgclasstab_id67  (cost=0.00..1.01 rows=1 width=524)
                      ->  Index Scan using sggeofacelist_pkey on sggeofacelist  (cost=0.29..8.31 rows=1 width=4974)
                            Index Cond: (oid = sgclasstab_id67.att_1114)
                ->  Index Scan using sggeopointlist_idx on sggeopointlist  (cost=0.27..0.84 rows=1 width=40)
                      Index Cond: (sggeofacelist.feature && feature)
                      Filter: ((sgclasstab_id68.att_1139 = oid) AND _st_intersects(sggeofacelist.feature, feature))

结果似乎是相同的,并且第一次测试已经暗示它的数量级更快。