在1000万行的表上优化SQL查询:无休止的查询

时间:2016-09-30 11:08:14

标签: sql postgresql postgresql-performance

我有两张桌子:

CREATE TABLE routing
 (
  id integer NOT NULL,
  link_geom geometry,
  source integer,
  target integer,
  traveltime_min double precision,
  CONSTRAINT routing_pkey PRIMARY KEY (id)
  )
  WITH (
  OIDS=FALSE
  );

 CREATE INDEX routing_id_idx
    ON routing
    USING btree
    (id);

CREATE INDEX routing_link_geom_gidx
 ON routing
 USING gist
 (link_geom);

CREATE INDEX routing_source_idx
 ON routing
 USING btree
 (source);

CREATE INDEX routing_target_idx
 ON routing
 USING btree
 (target);

CREATE TABLE test
(
 link_id character varying,
 link_geom geometry,
 id integer NOT NULL,
 .. (some more attributes here)
 traveltime_min double precision,
 CONSTRAINT id PRIMARY KEY (id),
 CONSTRAINT test_link_id_key UNIQUE (link_id)
  )
 WITH (
  OIDS=FALSE
  );
 ALTER TABLE test
 OWNER TO postgres;

我想尝试下面的查询:

update routing
set  traveltime_min = t2.traveltime_min
from test t2
where t2.id = routing.id 

两个表都有近1000万行。问题是这个查询运行无休止。在这里' EXPLAIN'所示:

Update on routing  (cost=601725.94..1804772.15 rows=9712264 width=208)
 ->  Hash Join  (cost=601725.94..1804772.15 rows=9712264 width=208)
       Hash Cond: (routing.id = t2.id)"
        ->  Seq Scan on routing  (cost=0.00..366200.23 rows=9798223 width=194)"
        ->  Hash  (cost=423414.64..423414.64 rows=9712264 width=18)"
            ->  Seq Scan on test t2  (cost=0.00..423414.64 rows=9712264 width=18)"

我无法理解可能导致如此缓慢响应的问题。 是否可能是服务器设置引起的问题?问题是我使用默认的postgrSQL 9.3设置。

2 个答案:

答案 0 :(得分:0)

在运行routing之前删除UPDATE上的所有索引,然后再添加它们。这将带来巨大的进步。

在运行work_mem的会话中将UPDATE设置为高。这将有助于哈希 将shared_buffers设置为可用内存的1/4,但不要超过1GB。

答案 1 :(得分:0)

  • 如果并非所有行都被UPDATE实际更改(如果获得的值与它们相同),则应避免这些幂等更新。
  • 如果您希望查询影响每一行,则查询计划并不重要。 [可能,除了散列哈希表的情况......]
-- these could be needed if the update would be more selective...
VACUUM analyze routing;
VACUUM analyze test;

UPDATE routing dst
SET  traveltime_min = src.traveltime_min
FROM test src
WHERE dst.id = src.id
   -- avoid useless updates and row-versions
AND dst.traveltime_min IS DISTINCT FROM src.traveltime_min
   ;

-- VACUUM analyze routing;