我在子查询中有以下两个非常相似的语句。我强调了与**的区别。
1:
SELECT DISTINCT name
FROM person, nameindex n
WHERE person.id1 ='0812'
AND person.id2 =n.id2
AND person.id1 =n.id1
AND n.phonetic IN
(SELECT n2.phonetic
FROM nameindex n2
WHERE n2.id1=person.id1 **
GROUP BY n2.phonetic
HAVING COUNT(*) BETWEEN 4 AND 500)
2:
SELECT DISTINCT name
FROM person, nameindex n
WHERE person.id1 ='0812'
AND person.id2 =n.id2
AND person.id1 =n.id1
AND n.phonetic IN
(SELECT n2.phonetic
FROM nameindex n2
WHERE n2.id1='0812' **
GROUP BY n2.phonetic
HAVING COUNT(*) BETWEEN 4 AND 500)
我认为oracle可以推断,子查询中person.id1
必须是常量0812
。但是,这两个查询都会产生极不相同的执行计划和成本(1:成本4404211855,而成本:36237)。这是为什么?
这是一个分析查询,而不是OLTP,因此没有为此特定查询定义索引。
(查询的背景:获取id1 ='0812'中的人名,他们在nameindex表中有一个语音条目,其中有4到500次出现。)
答案 0 :(得分:4)
我使用以下设置运行了类似的查询:
CREATE TABLE person (id1, id2, NAME) AS
SELECT to_char(mod(ROWNUM, 1000), 'fm0000'), ROWNUM,
dbms_random.string('A',10)
FROM dual
CONNECT BY LEVEL <= 1e6;
CREATE TABLE nameindex (id1, id2, phonetic) AS
SELECT id1, id2, to_char(dbms_random.value(1, 200), 'fm000')
FROM person;
我发现您的第一个查询产生了以下计划:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 291343677
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2040 | 331K (2)| 01:06:2
| 1 | HASH UNIQUE | | 1 | 2040 | 331K (2)| 01:06:2
|* 2 | FILTER | | | | |
|* 3 | HASH JOIN | | 891 | 1775K| 1750 (2)| 00:00:2
|* 4 | TABLE ACCESS FULL | NAMEINDEX | 892 | 18732 | 739 (2)| 00:00:0
|* 5 | TABLE ACCESS FULL | PERSON | 1395 | 2750K| 1010 (2)| 00:00:1
|* 6 | FILTER | | | | |
| 7 | HASH GROUP BY | | 9550 | 76400 | 740 (2)| 00:00:0
|* 8 | TABLE ACCESS FULL| NAMEINDEX | 9550 | 76400 | 739 (2)| 00:00:0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "NAMEINDEX" "N2" WHERE "N2"."ID1"=:B1
GROUP BY "N2"."PHONETIC" HAVING "N2"."PHONETIC"=:B2 AND COUNT(*)>=
COUNT(*)<=500))
3 - access("PERSON"."ID2"="N"."ID2" AND "PERSON"."ID1"="N"."ID1")
4 - filter("N"."ID1"='0812')
5 - filter("PERSON"."ID1"='0812')
6 - filter("N2"."PHONETIC"=:B1 AND COUNT(*)>=4 AND COUNT(*)<=500)
8 - filter("N2"."ID1"=:B1)
正如您所见,IN半连接被重写为EXISTS,它产生与此查询相同的计划:
SELECT DISTINCT NAME
FROM person, nameindex n
WHERE person.id1 = '0812'
AND person.id2 = n.id2
AND person.id1 = n.id1
AND EXISTS (SELECT NULL
FROM nameindex n2
WHERE n2.id1 = person.id1
AND n2.phonetic = n.phonetic
GROUP BY n2.phonetic
HAVING COUNT(*) BETWEEN 4 AND 500);
在这里,您会看到子查询是非常数,因此会为主查询的每一行计算,从而导致执行计划不够理想。
我建议您在使用聚合半连接时使用GROUP BY中的所有连接列。以下查询生成最佳计划:
SELECT DISTINCT NAME
FROM person, nameindex n
WHERE person.id1 = '0812'
AND person.id2 = n.id2
AND person.id1 = n.id1
AND (n.id1, n.phonetic) IN (SELECT n2.id1, n2.phonetic
FROM nameindex n2
GROUP BY n2.id1, n2.phonetic
HAVING COUNT(*) BETWEEN 4 AND 500);