加入大表时缓慢的postgres查询

时间:2013-03-27 17:12:53

标签: postgresql query-optimization

我的查询表现得很慢。我认为问题在于我正在加入几个大表,但我仍然期望更好的表现。查询和解释分析如下:

SELECT
    "m_advertsnapshot"."id",
    "m_advertsnapshot"."created",
    "m_advertsnapshot"."modified",
    "m_advertsnapshot"."snapshot_timestamp",
    "m_advertsnapshot"."source_name",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NOT NULL and m_advert.colour_id IS NOT NULL and m_advert.ctype IS NOT NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_has_height_plate_ctype",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL and m_advert.colour_id is NULL and m_advert.ctype is NULL  WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height_and_missing_plate_c268",
    COUNT("m_adverthistory"."id") AS "adh_count",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL and m_advert.colour_id is NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height_and_missing_plate",
    COUNT("m_advert"."widget_listing_id") AS "adh_count_with_wl"
FROM "m_advertsnapshot"
    LEFT OUTER JOIN "m_adverthistory" ON ("m_advertsnapshot"."id" = "m_adverthistory"."advert_snapshot_id")
    LEFT OUTER JOIN "m_advert" ON ("m_adverthistory"."advert_id" = "m_advert"."id")
GROUP BY
    "m_advertsnapshot"."id",
    "m_advertsnapshot"."created",
    "m_advertsnapshot"."modified",
    "m_advertsnapshot"."snapshot_timestamp",
    "m_advertsnapshot"."source_name"
ORDER BY
    "m_advertsnapshot"."snapshot_timestamp" DESC



"Sort  (cost=796180.41..796180.90 rows=196 width=72) (actual time=18051.504..18051.519 rows=196 loops=1)"
"  Sort Key: m_advertsnapshot.snapshot_timestamp"
"  Sort Method: quicksort  Memory: 60kB"
"  ->  HashAggregate  (cost=796170.99..796172.95 rows=196 width=72) (actual time=18051.330..18051.396 rows=196 loops=1)"
"        ->  Hash Right Join  (cost=227052.68..622950.33 rows=6298933 width=72) (actual time=2082.551..12166.226 rows=6298933 loops=1)"
"              Hash Cond: (m_adverthistory.advert_snapshot_id = m_advertsnapshot.id)"
"              ->  Hash Left Join  (cost=227045.27..536332.59 rows=6298933 width=24) (actual time=2082.483..9971.996 rows=6298933 loops=1)"
"                    Hash Cond: (m_adverthistory.advert_id = m_advert.id)"
"                    ->  Seq Scan on m_adverthistory  (cost=0.00..121858.33 rows=6298933 width=12) (actual time=0.003..1644.060 rows=6298933 loops=1)"
"                    ->  Hash  (cost=202575.12..202575.12 rows=1332812 width=20) (actual time=2080.897..2080.897 rows=1332812 loops=1)"
"                          Buckets: 2048  Batches: 128  Memory Usage: 525kB"
"                          ->  Seq Scan on m_advert  (cost=0.00..202575.12 rows=1332812 width=20) (actual time=0.007..1564.220 rows=1332812 loops=1)"
"              ->  Hash  (cost=4.96..4.96 rows=196 width=52) (actual time=0.062..0.062 rows=196 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 17kB"
"                    ->  Seq Scan on m_advertsnapshot  (cost=0.00..4.96 rows=196 width=52) (actual time=0.004..0.030 rows=196 loops=1)"
"Total runtime: 18051.730 ms"

使用postgres 9.2进行查询需要18秒。表格大小为:

m_advertsnapshot - 196 rows
m_adverthistory - 6,298,933 rows
m_advert - 1,332,812 rows

的DDL:

-- m_advertsnapshot

CREATE TABLE m_advertsnapshot
(
  id serial NOT NULL,
  snapshot_timestamp timestamp with time zone NOT NULL,
  source_name character varying(50),
  CONSTRAINT m_advertsnapshot_pkey PRIMARY KEY (id),
  CONSTRAINT m_advertsnapshot_source_name_6a9a437077520191_uniq UNIQUE (source_name, snapshot_timestamp)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_advertsnapshot_snapshot_timestamp
  ON m_advertsnapshot
  USING btree
  (snapshot_timestamp);

-- m_adverthistory

CREATE TABLE m_adverthistory
(
  id serial NOT NULL,
  advert_id integer NOT NULL,
  advert_snapshot_id integer NOT NULL,
  observed_timestamp timestamp with time zone NOT NULL,
  CONSTRAINT m_adverthistory_pkey PRIMARY KEY (id),
  CONSTRAINT advert_id_refs_id_30735d9eef85241c FOREIGN KEY (advert_id)
      REFERENCES m_advert (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT advert_snapshot_id_refs_id_55d3986f4f270624 FOREIGN KEY (advert_snapshot_id)
      REFERENCES m_advertsnapshot (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT m_adverthistory_advert_id_13fa0dae39e78983_uniq UNIQUE (advert_id, advert_snapshot_id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_adverthistory_advert_id
  ON m_adverthistory
  USING btree
  (advert_id);

CREATE INDEX m_adverthistory_advert_snapshot_id
  ON m_adverthistory
  USING btree
  (advert_snapshot_id);

-- m_advert

CREATE TABLE m_advert
(
  id serial NOT NULL,
  widget_listing_id integer,
  height integer,
  ctype integer,
  colour_id integer,
  CONSTRAINT m_advert_pkey PRIMARY KEY (id),
  CONSTRAINT "colour_id_refs_id_1e4e2dac0183b419" FOREIGN KEY (colour_id)
      REFERENCES colour ("id") MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT widget_listing_id_refs_id_5a7e62d0d4f48013 FOREIGN KEY (widget_listing_id)
      REFERENCES m_widgetlisting (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,

)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_advert_advert_seller_id
  ON m_advert
  USING btree
  (advert_seller_id);

CREATE INDEX m_advert_colour_id
  ON m_advert
  USING btree
  (colour_id);

CREATE INDEX m_advert_widget_listing_id
  ON m_advert
  USING btree
  (widget_listing_id);

任何关于如何提高性能的想法都会受到赞赏。

谢谢!

1 个答案:

答案 0 :(得分:2)

  • 架构看起来合理(对于您实际上不需要索引的查询,并且某些索引已经被FK约束所涵盖)
  • Junction表不需要代理键(但不会造成伤害)。
  • 您的查询速度缓慢的真正原因是需要所有表中的所有行来计算聚合。如果您需要100%的数据,那么索引就无济于事。
  • 添加其他约束(例如,在snapshot_timestamp> = some_date上)可能会导致使用索引的不同计划。