postgresql hashaggregate查询优化

时间:2014-01-13 14:47:09

标签: postgresql query-optimization query-performance

我正在尝试优化下面的查询。

select cellid2 as cellid, max(endeks) as turkcell 
from (select a.cellid2 as cellid2, b.endeks 
    from (select geom, cellid as cellid2 from grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 ) a join (select endeks, st_transform(geom,    2320) as geom_tmp from  turkcell_data ) b on st_intersects(a.geom, b.geom_tmp) ) x
group by cellid2 limit 5

并解释分析回报

"Limit  (cost=81808.31..81808.36 rows=5 width=12) (actual time=271376.201..271376.204 rows=5 loops=1)"
"  ->  HashAggregate  (cost=81808.31..81879.63 rows=7132 width=12) (actual time=271376.200..271376.203 rows=5 loops=1)"
"        ->  Nested Loop  (cost=0.00..81772.65 rows=7132 width=12) (actual time=5.128..269753.647 rows=1237707 loops=1)"
"              Join Filter: _st_intersects(grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000.geom, st_transform(turkcell_data.geom, 2320))"
"              ->  Seq Scan on turkcell_data  (cost=0.00..809.40 rows=3040 width=3045) (actual time=0.031..7.426 rows=3040 loops=1)"
"              ->  Index Scan using grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist on grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000  (cost=0.00..24.76 rows=7 width=124) (actual time=0.012..0.799 rows=647 loops=3040)"
"                    Index Cond: (geom && st_transform(turkcell_data.geom, 2320))"
"Total runtime: 271387.499 ms"

几何列和cellid列上存在索引。我读过而不是使用max,而是通过desc和limit 1更好地工作。但是,由于我有分组条款,我认为它不起作用。有没有办法做到这一点或任何其他方式来改善性能?

表格定义:

CREATE TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
(
  regionid numeric,
  geom geometry(Geometry,2320),
  cellid integer,
  turkcell double precision
)
WITH (
  OIDS=FALSE
);
ALTER TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  OWNER TO postgres;

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid;

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid
  ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  USING btree
  (cellid );

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist;

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist
  ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  USING gist
  (geom );

CREATE TABLE turkcell_data
(
  gid serial NOT NULL,
  objectid_1 integer,
  objectid integer,
  neighbourh numeric,
  endeks numeric,
  coorx numeric,
  coory numeric,
  shape_leng numeric,
  shape_le_1 numeric,
  shape_area numeric,
  geom geometry(MultiPolygon,4326),
  CONSTRAINT turkcell_data_pkey PRIMARY KEY (gid )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE turkcell_data
  OWNER TO postgres;

-- Index: turkcell_data_geom_gist

-- DROP INDEX turkcell_data_geom_gist;

CREATE INDEX turkcell_data_geom_gist
  ON turkcell_data
  USING gist
  (geom );

1 个答案:

答案 0 :(得分:2)

将重新投影的数据存储到2320,索引该列,并在连接中使用它,或者在turkcell_data中的几何体的变换投影上创建索引。我通常更喜欢后者:

CREATE INDEX turkcell_data_geom_gist2320
  ON turkcell_data
  USING gist
  (st_transform(geom, 2320) );

另一个问题可能是你的几何形状非常复杂 - 如果你的任何多边形有一个相对较多的点,你可能会被卡在路口上。不过,先试试索引。