我是sql优化的新手,我正在尝试理解为什么在IN子句中有多个项目会导致大的性能损失,如果可能的话,如何防止它。以下是我正在使用的或多或少的东西。第二个查询是我现在所拥有的,我正在寻求提高性能。在现实生活中,TABLE_1有数百万行,计划的排序部分的CPU成本为21M。
SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( ‘STATE1’ ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC
统计:
|--------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers ||
|--------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 120 |00:00:00.01 | 965 ||
|| 1 |||NESTED LOOPS | | 1 | | 120 |00:00:00.01 | 965 ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 120 |00:00:00.01 | 845 ||
|| 3 |||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 120 |00:00:00.01 | 245 ||
|| 5 |||||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 6 ||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 7 |||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 1 | 25 | 120 |00:00:00.01 | 125 ||
||* 8 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 120 | 1 | 120 |00:00:00.01 | 600 ||
|| 9 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 120 | 1 | 120 |00:00:00.01 | 120 ||
|--------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 7 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”=‘IN' AND “TABLE_1”.”F”=’STATE1’) |
| 8 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
+--------------------------------------------------------------------------------------------------------------------------+
当我在IN子句中更新查询以获得“STATE2”时,会在计划中添加一个额外的排序步骤。
SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( 'STATE1', 'STATE2' ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC
统计:
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 1 |||NESTED LOOPS | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 150 |00:00:00.01 | 926 | | | ||
|| 3 |||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 150 |00:00:00.01 | 176 | | | ||
|| 5 |||||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 6 ||||||||SORT ORDER BY STOPKEY | | 1 | 1 | 150 |00:00:00.01 | 176 | 15360 | 15360 |14336 (0)||
|| 7 |||||||||INLIST ITERATOR | | 1 | | 165 |00:00:00.01 | 176 | | | ||
||* 8 ||||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 2 | 1 | 165 |00:00:00.01 | 176 | | | ||
||* 9 |||||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 2 | 50 | 165 |00:00:00.01 | 11 | | | ||
||* 10 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 150 | 1 | 150 |00:00:00.01 | 750 | | | ||
|| 11 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 150 | 1 | 150 |00:00:00.01 | 150 | | | ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter(ROWNUM<=150) |
| 8 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 9 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”='IN' AND ((“TABLE_1”.”F”='STATE1') OR (“TABLE_1”.”F”='STATE2')) |
| 10 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
| |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
我一直在寻找几天。我尝试过的一个建议是使用提示/*+ USE_CONCAT (OR_PREDICATES(1)) */
,这有点可以减少一半的内存使用量,但它并没有完全消除这个问题。
编辑:环顾四周(http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by#tip-ixord-full)并认为这可能是由于订单。如果我将语句的顺序更改为:TABLE_1.F,TABLE_1.ID
和TOPNWRAPPER.F,TOPNWRAPPER.ID ASC
那么排序操作就会消失,遗憾的是我需要基于ID的前n行。或者,我尝试在(ID F)上创建一个新索引进行测试,它也删除了排序操作,但每行ID是唯一的,这使得表访问操作效率降低。
编辑2:
OPERATION |OPTION |CPU COST
--------------------------------------------
SORT |ORDER BY STOPKEY |21042774
|NESTED LOOPS |OUTER |56052
||TABLE ACCESS |BY INDEX ROWID |38980
|||INDEX |RANGE SCAN |30086
答案 0 :(得分:2)
性能差异可能并不重要。执行计划的区别在于,如果前导列使用相等条件,则仅对多列索引访问进行隐式排序。
效果差异
不要过分担心执行计划的成本。即使它被称为“基于成本的优化工具”,但成本却是一个奇怪的数字,世界上只有少数人完全理解。
比较解释计划成本很复杂的一个原因是总成本有时低于儿童运营成本之一。正如我在my answer here中解释的那样,这可能发生在COUNT STOPKEY
操作中。这是甲骨文的说法&#34;这个子操作会花费这么大的金额,但COUNT STOPKEY可能会在它达到那么高之前切断它#34;。通常最好比较计划的最高成本,但即使这个数字有时也会产生误导,正如该答案中的其他例子所示。
这意味着通常运行时间是唯一重要的事情。如果两次查询的A-Time(实际时间)仅为0.1秒,那么您的工作可能就在这里完成。
执行计划差异
执行计划的差异是由存储和访问多列索引的方式引起的。有时,当扫描索引时,结果将自动存储,有时则不会。这就是为什么一个计划有COUNT STOPKEY
而另一个计划的成本更高SORT ORDER BY STOPKEY
。
要演示此计划差异,请创建一个只有2列和4行的简单表和索引:
create table test1 as
select 1 a, 10 b from dual union all
select 1 a, 30 b from dual union all
select 2 a, 20 b from dual union all
select 2 a, 40 b from dual;
create index test1_idx on test1(a, b);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
以下是关于如何存储索引的简化概念。数据首先由前导列排序,然后由尾随列排序。
+----+
+------+Root+-------+
| +----+ |
| |
+-v-+ +-v-+
+--+A=1+--+ +--+A=2+--+
| +---+ | | +---+ |
| | | |
+-v--+ +--v-+ +-v--+ +--v-+
|B=10| |B=30| |B=20| |B=40|
+----+ +----+ +----+ +----+
如果查询仅访问前导列A中的一个值,则它可以按顺序读取列B中的值,而无需任何额外的工作。 Oracle转到其中一个A块,然后按顺序读取B块,甚至没有尝试。
请注意此查询的ORDER BY
如何,但执行计划中没有SORT
。
explain plan for select * from test1 where a = 1 and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 598212486
--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN| TEST1_IDX |
--------------------------------------
但是如果查询访问前导列A中的多个值,则不一定按顺序检索B的结果。 Oracle可以按顺序读取A块,但B块顺序仅适用于一个 A值。
现在,执行计划中会出现额外的SORT ORDER BY
操作。
explain plan for select * from test1 where a in (1,2) and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 704117715
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | INLIST ITERATOR | |
| 3 | INDEX RANGE SCAN| TEST1_IDX |
----------------------------------------
这就是为什么将column1 = value1
更改为column1 in (value1, value2)
可能会额外增加SORT
次操作。
答案 1 :(得分:0)
使用EXISTS
代替IN
。
示例:
EXISTS (select 1 from DUAL where TABLE_1.F='STATE1' or TABLE_1.F='STATE2')
尝试看看计划是否更改。
如果要使用NOT IN
,请使用提示HASH_AJ
或NL_AJ
。