postgresql server 9.1
横幅(40K行)和事件(140M行) - 包含客户端数据的表。 client_id row是客户端的整数id,上面有索引。
第一次查询:
SELECT DISTINCT client_id
FROM events
WHERE type = 'banner_show' AND client_id IN (select distinct client_id from banners)
工作约23秒
第二个查询:
SELECT DISTINCT client_id
FROM events
WHERE type = 'banner_show' AND client_id IN (1, 2, 3, 4...)
其中" 1,2,3,4 ......" - 查询结果"从横幅"中选择不同的client_id。 第二次查询工作大约10分钟,直到我停止它。 为什么具有相同数据的查询的性能有如此显着的差异?
解释分析(第一次查询):
EXPLAIN ANALYZE
SELECT DISTINCT client_id
FROM events
WHERE type = 'banner_show' AND client_id IN (select distinct client_id from banners)
"HashAggregate (cost=4481767.32..4481767.74 rows=42 width=4) (actual time=24726.275..24727.259 rows=8572 loops=1)"
" -> Hash Join (cost=1954.16..4481542.58 rows=89895 width=4) (actual time=16052.849..24698.907 rows=68770 loops=1)"
" Hash Cond: (events.client_id = banners.client_id)"
" -> Seq Scan on events (cost=0.00..4476744.47 rows=179790 width=4) (actual time=16037.562..24634.461 rows=69272 loops=1)"
" Filter: ((type)::text = 'banner_show'::text)"
" -> Hash (cost=1767.58..1767.58 rows=14926 width=4) (actual time=15.258..15.258 rows=13923 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 490kB"
" -> HashAggregate (cost=1469.06..1618.32 rows=14926 width=4) (actual time=12.421..13.805 rows=13923 loops=1)"
" -> Seq Scan on banners (cost=0.00..1369.45 rows=39845 width=4) (actual time=0.005..6.883 rows=38184 loops=1)"
"Total runtime: 24727.909 ms"
解释分析(第二个查询):
"HashAggregate (cost=842924414.03..842924414.17 rows=14 width=4) (actual time=1521873.754..1521874.796 rows=8574 loops=1) "
" -> Bitmap Heap Scan on events (cost=534167.70..842924261.77 rows=60905 width=4) (actual time=260305.233..1521811.644 rows=68782 loops=1) "
" Recheck Cond: (client_id = ANY ('{153566,171259,151232,155132,160170,162720,152159,166302,175899,158611,}'::integer[])) "
" Filter: ((type)::text = 'banner_show'::text) "
" -> Bitmap Index Scan on ix_events_client_id (cost=0.00..534152.47 rows=48209684 width=0) (actual time=4916.828..4916.828 rows=5345417 loops=1) "
" Index Cond: (client_id = ANY ('{153566,171259,151232,155132,......}'::integer[])) "
"Total runtime: 1521875.137 ms "
表shemas:
CREATE TABLE banners
(
id serial NOT NULL,
type_id integer,
form_id integer,
banner character varying,
client_id integer,
created timestamp without time zone,
deleted timestamp without time zone,
CONSTRAINT banners_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE banners
OWNER TO postgres;
CREATE INDEX ix_banners_client_id
ON banners
USING btree
(client_id);
CREATE TABLE events
(
id serial NOT NULL,
time_created timestamp without time zone,
type character varying,
date timestamp without time zone,
param character varying,
client_id integer,
hash_id character varying,
CONSTRAINT events_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE events
OWNER TO postgres;
CREATE INDEX ix_events_client_id
ON events
USING btree
(client_id);
CREATE INDEX ix_events_hash_id
ON events
USING btree
(hash_id COLLATE pg_catalog."default");
答案 0 :(得分:0)
当您的过滤条件有两列时,您必须创建一个索引来覆盖它们,请参阅
CREATE INDEX event_client_show_idx
ON events
USING btree (client_id, type);
第一个选择+解释
EXPLAIN
SELECT DISTINCT client_id
FROM events
WHERE client_id IN (1, 2, 3, 4) AND type = 'banner_show';
返回类似的内容:
Unique (cost=0.15..8.65 rows=1 width=4)
-> Index Only Scan using event_client_show_idx on events (cost=0.15..8.65 rows=1 width=4)
Index Cond: ((client_id = ANY ('{1,2,3,4}'::integer[])) AND (type = 'banner_show'::text))
的Markus Winand博客上阅读有关索引的更多信息