Postgres:优化复杂的查询?

时间:2014-03-03 16:21:53

标签: postgresql

如果查询开始变得比下面的那么大,我就无法优化查询。

select distinct tm.proto_location from track_message tm where
   workflow_analytic_instance_id = 204 and tm.id in 
   (Select track_message_id from track_message_to_track_mapping where track_id in 
   (select distinct t.id from track t, track_item item where t.id = item.track_id and
   item.item_time between 1328816277089000 and 1328816287089000 and item.id in 
   (Select track_item_id from track_point tp where ST_Intersects(tp.track_position,
   ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326)))));

我不确定是否需要重新构建查询或添加其他索引,因为我目前在track_position上只有1。以下是我对查询的分析

HashAggregate  (cost=3321073.27..3321099.83 rows=2656 width=126) (actual
time=38937.642..38937.781 rows=341 loops=1)
  ->  Hash Semi Join  (cost=3312041.16..3321066.63 rows=2656 width=126) (actual
      time=38860.624..38937.235 rows=341 loops=1)"
      Hash Cond: (tm.id = track_message_to_track_mapping.track_message_id)
      ->  Seq Scan on track_message tm  (cost=0.00..8441.48 rows=5280 width=134) (actual time=31.643..81.135 rows=5027 loops=1)
          Filter: (workflow_analytic_instance_id = 204)
      ->  Hash  (cost=3310705.63..3310705.63 rows=81402 width=8) (actual time=38824.785..38824.785 rows=1026 loops=1)
          Buckets: 4096  Batches: 4  Memory Usage: 11kB
          ->  Hash Join  (cost=3306662.03..3310705.63 rows=81402 width=8) (actual time=38741.641..38820.901 rows=1026 loops=1)
                Hash Cond: (track_message_to_track_mapping.track_id = t.id)
                ->  Seq Scan on track_message_to_track_mapping  (cost=0.00..2995.04 rows=162804 width=16) (actual time=0.023..36.404 rows=162678 loops=1)
                ->  Hash  (cost=3306623.23..3306623.23 rows=3104 width=8) (actual time=38737.721..38737.721 rows=1026 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 41kB"
                      ->  Unique  (cost=3299618.84..3306592.19 rows=3104 width=8) (actual time=38578.330..38737.166 rows=1026 loops=1)
                            ->  Merge Join  (cost=3299618.84..3306584.43 rows=3104 width=8) (actual time=38578.327..38735.062 rows=10303 loops=1)
                                  Merge Cond: (t.id = item.track_id)
                                  ->  Index Scan using track_pkey on track t  (cost=0.00..6763.86 rows=162639 width=8) (actual time=0.020..122.626 rows=160111 loops=1)
                                  ->  Sort  (cost=3299617.79..3299625.55 rows=3104 width=8) (actual time=38571.786..38574.074 rows=10303 loops=1)
                                        Sort Key: item.track_id
                                        Sort Method: quicksort  Memory: 867kB
                                        ->  Hash Semi Join  (cost=2688037.93..3299437.75 rows=3104 width=8) (actual time=25663.691..38562.198 rows=10303 loops=1)
                                              Hash Cond: (item.id = tp.track_item_id)
                                              ->  Seq Scan on track_item item  (cost=0.00..598761.77 rows=17867 width=16) (actual time=1177.986..3128.122 rows=20606 loops=1)
                                                    Filter: ((item_time >= 1328816277089000::bigint) AND (item_time <= 1328816287089000::bigint))
                                              ->  Hash  (cost=2636161.58..2636161.58 rows=3161948 width=8) (actual time=24330.672..24330.672 rows=9485846 loops=1)
                                                    Buckets: 4096  Batches: 512 (originally 128)  Memory Usage: 1025kB"
                                                    ->  Seq Scan on track_point tp  (cost=0.00..2636161.58 rows=3161948 width=8) (actual time=5.506..20772.158 rows=9485846 loops=1)
                                                          Filter: ((track_position && '0103000020E6100000010000000500000000000000000062C00000000000804D4000000000008044C000000000000047400000000000C052C00000000000002E400000000000C05FC0000000000000394000000000000062C00000000000804D40'::geometry) AND _st_intersects(track_position, '0103000020E6100000010000000500000000000000000062C00000000000804D4000000000008044C000000000000047400000000000C052C00000000000002E400000000000C05FC0000000000000394000000000000062C00000000000804D40'::geometry))
 Total runtime: 38938.104 ms

由于数据库是由另一家公司创建的,我无法更改表格。但是我可以自由地添加额外的索引。查询中使用的表格如下。

CREATE TABLE d2d.track_message
(
id bigserial NOT NULL,
proto_location text,
workflow_analytic_instance_id bigint NOT NULL,
CONSTRAINT track_message_pkey PRIMARY KEY (id),
CONSTRAINT track_message_workflow_analytic_instance_id_fkey FOREIGN KEY(workflow_analytic_instance_id)
  REFERENCES d2d.workflow_analytic_instance (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track_message_to_track_mapping
(
id bigserial NOT NULL,
track_message_id bigint NOT NULL,
track_id bigint NOT NULL,
CONSTRAINT track_message_to_track_mapping_pkey PRIMARY KEY (id),
CONSTRAINT track_message_to_track_mapping_track_id_fkey FOREIGN KEY (track_id)
  REFERENCES d2d.track (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_message_to_track_mapping_track_message_id_fkey FOREIGN KEY (track_message_id)
  REFERENCES d2d.track_message (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track
(
id bigserial NOT NULL,
track_uuid text,
track_number text,
track_exercise_indicator_id bigint NOT NULL,
track_simulation_indicator_id bigint NOT NULL,
track_status_id bigint,
last_modified timestamp with time zone DEFAULT timezone('utc'::text, now()),
CONSTRAINT track_pkey PRIMARY KEY (id),
CONSTRAINT track_track_exercise_indicator_id_fkey FOREIGN KEY (track_exercise_indicator_id)
  REFERENCES d2d.track_exercise_indicator (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_simulation_indicator_id_fkey FOREIGN KEY (track_simulation_indicator_id)
  REFERENCES d2d.track_simulation_indicator (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_status_id_fkey FOREIGN KEY (track_status_id)
  REFERENCES d2d.track_status (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_uuid_key UNIQUE (track_uuid)
);

CREATE TABLE d2d.track_item
(
id bigserial NOT NULL,
track_item_type_id bigint NOT NULL,
item_time bigint NOT NULL,
image_source text,
track_id bigint NOT NULL,
CONSTRAINT track_item_pkey PRIMARY KEY (id),
CONSTRAINT track_item_track_id_fkey FOREIGN KEY (track_id)
  REFERENCES d2d.track (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_item_track_item_type_id_fkey FOREIGN KEY (track_item_type_id)
  REFERENCES d2d.track_item_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track_point
(
id bigserial NOT NULL,
track_position d2d.geometry(PointZ,4326),
track_point_type_id bigint,
track_point_source_type_id bigint,
last_modified timestamp with time zone DEFAULT timezone('utc'::text, now()),
track_item_id bigint NOT NULL,
CONSTRAINT track_point_pkey PRIMARY KEY (id),
CONSTRAINT track_point_track_item_id_fkey FOREIGN KEY (track_item_id)
  REFERENCES d2d.track_item (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_point_track_point_source_type_id_fkey FOREIGN KEY (track_point_source_type_id)
  REFERENCES d2d.track_point_source_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_point_track_point_type_id_fkey1 FOREIGN KEY (track_point_type_id)
  REFERENCES d2d.track_point_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

2 个答案:

答案 0 :(得分:1)

第一次尝试:使用EXISTS()而不是IN(),并将一个索引(假设这可能是UNIQUE,不确定)添加到track_item.item_time(未经测试,因为我没有数据,显然):

CREATE UNIQUE INDEX ON track_item ( item_time);

-- ----- 
EXPLAIN ANALYZE
SELECT DISTINCT tm.proto_location
FROM track_message tm
WHERE tm.workflow_analytic_instance_id = 204
AND EXISTS ( SELECT *
        FROM track_message_to_track_mapping tm2tm
        JOIN track t ON t.id = tm2tm.track_id
        JOIN track_item ti ON t.id = ti.track_id
        JOIN track_point tp ON ti.id = tp.track_item_id
        WHERE tm.id =tm2tm.track_message_id
        AND ti.item_time BETWEEN 1328816277089000 AND 1328816287089000
        AND ST_Intersects
                (tp.track_position
                , ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326)
                )
        )
        ;

答案 1 :(得分:0)

如果我没有犯错误,我认为你可以使用这样的东西:

SELECT DISTINCT tm.proto_location
FROM track_message tm
INNER JOIN track_message_to_track_mapping ON track_message_to_track_mapping.track_message_id = tm.id
INNER JOIN track t ON track_message_to_track_mapping.track_id = t.id
INNER JOIN track_item item ON t.id = item.track_id
INNER JOIN track_point ON track_point.track_item_id = item.id
WHERE workflow_analytic_instance_id = 204
  AND item.item_time BETWEEN 1328816277089000 AND 1328816287089000
  AND ST_Intersects(tp.track_position, ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326))