考虑以下问题:
select a.id from a
where
a.id in (select b.a_id from b where b.x='x1' and b.y='y1') and
a.id in (select b.a_id from b where b.x='x2' and b.y='y2')
order by a.date desc
limit 20
哪个应该可以改写为更快的那个:
select a.id from a inner join b as b1 on (a.id=b1.a_id) inner join b as b2 on (a.id=b2.a_id)
where
b1.x='x1' and b1.y='y1' and
b2.x='x2' and b2.y='y2'
order by a.date desc
limit 20
我们不希望通过更改源代码来重写我们的查询,因为它复杂化很多(特别是在使用Django时)。
因此,我们想知道PostgreSQL何时将子查询折叠为连接?何时没有?
这是简化的数据模型:
Table "public.a"
Column | Type | Modifiers
-------------------+------------------------+-------------------------------------------------------------
id | integer | not null default nextval('a_id_seq'::regclass)
date | date |
content | character varying(256) |
Indexes:
"a_pkey" PRIMARY KEY, btree (id)
"a_id_date" btree (id, date)
Referenced by:
TABLE "b" CONSTRAINT "a_id_refs_id_6e634433343d4435353" FOREIGN KEY (a_id) REFERENCES a(id) DEFERRABLE INITIALLY DEFERRED
Table "public.b"
Column | Type | Modifiers
----------+-----------+-----------
a_id | integer | not null
x | text | not null
y | text | not null
Indexes:
"b_x_y_a_id" UNIQUE CONSTRAINT, btree (x, y, a_id)
Foreign-key constraints:
"a_id_refs_id_6e634433343d4435353" FOREIGN KEY (a_id) REFERENCES a(id) DEFERRABLE INITIALLY DEFERRED
PostgreSQL的版本,我们测试了查询。
PostgreSQL 9.2.7 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.7.2 20130108 [gcc-4_7-branch revision 195012], 64-bit
PostgreSQL 9.4beta1 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.7.2 20130108 [gcc-4_7-branch revision 195012], 64-bit
查询计划(使用空文件缓存和内存缓存):
答案 0 :(得分:1)
你的最后一条评论指出了原因,我认为:除非有一个独特的约束条件使它们等效,否则这两个查询不是等价的。
等效架构的示例:
denis=# \d a
Table "public.a"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('a_id_seq'::regclass)
d | date | not null
Indexes:
"a_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "b" CONSTRAINT "b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
denis=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+-----------
a_id | integer | not null
val | integer | not null
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
使用该架构的等效违规数据:
denis=# select * from a order by d;
id | d
----+------------
1 | 2014-12-10
2 | 2014-12-11
3 | 2014-12-12
4 | 2014-12-13
5 | 2014-12-14
6 | 2014-12-15
(6 rows)
denis=# select * from b order by a_id, val;
a_id | val
------+-----
1 | 1
1 | 1
2 | 1
2 | 1
2 | 2
3 | 1
3 | 1
3 | 2
(8 rows)
使用两个IN子句的行:
denis=# select a.id, a.d from a where a.id in (select b.a_id from b where b.val = 1) and a.id in (select b.a_id from b where b.val = 2) order by d;
id | d
----+------------
2 | 2014-12-11
3 | 2014-12-12
(2 rows)
使用两个连接的行:
denis=# select a.id, a.d from a join b b1 on a.id = b1.a_id join b b2 on a.id = b2.a_id where b1.val = 1 and b2.val = 2 order by d;
id | d
----+------------
2 | 2014-12-11
2 | 2014-12-11
3 | 2014-12-12
3 | 2014-12-12
(4 rows)
虽然我已经看到你对b(a_id,x,y)有一个独特的约束。或许可以突出显示Postgres性能列表中的问题,以了解它在特定情况下没有崩溃的原因 - 或者至少不会生成完全相同的计划。
答案 1 :(得分:0)
-- The table definitions
CREATE TABLE table_a (
id SERIAL NOT NULL PRIMARY KEY
, d DATE NOT NULL
);
CREATE TABLE table_b (
id SERIAL NOT NULL PRIMARY KEY
, a_id INTEGER NOT NULL REFERENCES table_a(id)
, x VARCHAR NOT NULL
, y VARCHAR NOT NULL
);
-- fake some data
INSERT INTO table_a(d)
SELECT gs
FROM generate_series( '1904-01-01'::timestamp ,'2015-01-01'::timestamp, '1 day'::interval) gs;
INSERT INTO table_b(a_id, x, y) SELECT a.id, 'x1' , 'y1' FROM table_a a;
INSERT INTO table_b(a_id, x, y) SELECT a.id, 'x2' , 'y2' FROM table_a a;
INSERT INTO table_b(a_id, x, y) SELECT a.id, 'x3' , 'y3' FROM table_a a;
DELETE FROM table_b WHERE RANDOM() > 0.3;
CREATE UNIQUE INDEX ON table_a(d, id); -- date first
CREATE INDEX ON table_b(a_id); -- supporting the FK
-- For initialising the statistics
VACUUM ANALYZE table_a;
VACUUM ANALYZE table_b;
-- original query
EXPLAIN ANALYZE
SELECT a.id
FROM table_a a
WHERE a.id IN (SELECT b.a_id FROM table_b b WHERE b.x='x1' AND b.y='y1')
AND a.id IN (SELECT b.a_id FROM table_b b WHERE b.x='x2' AND b.y='y2')
order by a.d desc
limit 20;
-- EXISTS() version
EXPLAIN ANALYZE
SELECT a.id
FROM table_a a
WHERE EXISTS (SELECT * FROM table_b b WHERE b.a_id= a.id AND b.x='x1' AND b.y='y1')
AND EXISTS (SELECT * FROM table_b b WHERE b.a_id= a.id AND b.x='x2' AND b.y='y2')
order by a.d desc
limit 20;
结果查询计划:
Limit (cost=0.87..491.23 rows=20 width=8) (actual time=0.080..0.521 rows=20 loops=1)
-> Nested Loop Semi Join (cost=0.87..15741.40 rows=642 width=8) (actual time=0.080..0.518 rows=20 loops=1)
-> Nested Loop Semi Join (cost=0.58..14380.54 rows=4043 width=12) (actual time=0.017..0.391 rows=74 loops=1)
-> Index Only Scan Backward using table_a_d_id_idx on table_a a (cost=0.29..732.75 rows=40544 width=8) (actual time=0.008..0.048 rows=231 loops=1)
Heap Fetches: 0
-> Index Scan using table_b_a_id_idx on table_b b_1 (cost=0.29..0.34 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=231)
Index Cond: (a_id = a.id)
Filter: (((x)::text = 'x2'::text) AND ((y)::text = 'y2'::text))
Rows Removed by Filter: 0
-> Index Scan using table_b_a_id_idx on table_b b (cost=0.29..0.34 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=74)
Index Cond: (a_id = a.id)
Filter: (((x)::text = 'x1'::text) AND ((y)::text = 'y1'::text))
Rows Removed by Filter: 1
Total runtime: 0.547 ms
tableb.a_id
上的NOT NULL)table_b(a_id)
是绝对必要的。table_a(d, id)
)