我终于问了我的第一个问题(虽然我是一个长期跟踪者)。
SQL查询在前几天引起了我的注意。当使用WHERE
运算符将索引与可能值进行比较时,问题是IN
子句中的性能。
SELECT SUM (parts.quantity) AS quantity,
concessions.concessionCode,
concessions.description AS concessionDesc,
parts.type,
activities.activityCode,
REPLACE (activities.activityCode, activities.lvl2 || '-', '') AS activityCodeDisplay,
strings.activityDesc,
strings.activityDesc2,
strings.activityDesc3
FROM tb_parts parts,
tb_activities activities,
tb_strings strings,
tb_concessions concessions
WHERE parts.activityCode = activities.activityCode
AND parts.concessionCode = activities.concessionCode
AND activities.concessionCode = concesions.concessionCode
AND activities.concessionCode = strings.concessionCode
AND activities.activityCode = strings.activityCode
AND strings.language = 'ENG'
--AND parts.concesionCode IN ('ZD', 'G9', 'TR', 'JS0')
AND parts.concesionCode IN ('ZD', 'G9')
AND parts.date >= TO_DATE ('01/01/2013 00:00:00', 'DD/MM/YYYY HH24:MI:SS')
AND parts.date <= TO_DATE ('30/04/2013 23:59:59', 'DD/MM/YYYY HH24:MI:SS')
AND parts.type IN ('U', 'M')
AND parts.value = 'E'
GROUP BY concesions.concessionCode,
concesions.description,
parts.type,
activities.activityCode,
REPLACE (activities.activityCode, activities.lvl2|| '-', ''),
strings.activityDesc,
strings.activityDesc2,
strings.activityDesc3
ORDER BY concesions.concessionCode;
我遇到的问题是 - 如果查询是这样运行的(IN
有两个值),则需要30秒。如果它使用四个值运行(就像它在注释行中一样),则查询需要5秒。我希望将索引与多个值进行比较会花费更多时间,但似乎并非如此。我在白天多次重复“测试”,它们总是或多或少相同(30 + -1s,5 + -1s)。
任何有关为什么这种表现方式的见解都不仅仅是值得赞赏的!
P.S。我已经翻译了表/列的名称,如果有任何差异,请对不起。
P.P.S。我用连接重写了这段代码,速度要快得多,但这种异常背后的原因仍然困扰着我:)
编辑:终于上班了!经过一些修改后,我已经能够为这两个版本创建执行计划,甚至对于查询的第三个版本(在where
中 4和2两个值),时间到了600毫秒)。此外,表格中有关于数据的几个问题,所以这里有一些信息:
All the stats are analyzed the day that queries were executed
Table parts
total rows - 3.2 M
matches for 2 values - 1.08 M (~34%)
matches for 4 values - 1.30 M (~41%)
Table activities
total rows - 3866
matches for 2 values - 321 (~ 8%)
matches for 4 values - 644 (~16%)
Table strings
total rows - 7436
matches for 2 values - 642 (~ 8%)
matches for 4 values - 1288 (~17%)
Index in_parts
codConcession
username
date
正因为如此,我认为在使用动态采样时没有重大区别(除了+ 2 / 3s)(如果我做得正确,就是/*+ dynamic_sampling(tb_parts 10) */
之后的SELECT
关键字)
对于两个值:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 864 (1)| 00:00:11 |
| 1 | SORT ORDER BY | | 1 | 186 | 864 (1)| 00:00:11 |
| 2 | HASH GROUP BY | | 1 | 186 | 864 (1)| 00:00:11 |
|* 3 | TABLE ACCESS BY INDEX ROWID | tb_parts | 1 | 37 | 818 (1)| 00:00:10 |
| 4 | NESTED LOOPS | | 1 | 186 | 862 (1)| 00:00:11 |
| 5 | NESTED LOOPS | | 1 | 149 | 44 (0)| 00:00:01 |
| 6 | NESTED LOOPS | | 34 | 2108 | 10 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 2 | 54 | 2 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | pk_concession | 2 | | 1 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | tb_activities | 17 | 595 | 4 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | pk_activity | 17 | | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 13 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
|* 14 | INDEX RANGE SCAN | in_parts | 454 | | 648 (1)| 00:00:08 |
-----------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U')
AND "parts"."activityCode"="activities"."activityCode")
9 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='ZD')
11 - access("activities"."concessionCode"="concessions"."concessionCode")
filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='ZD')
13 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='ZD')
14 - access("parts"."concessionCode"="activities"."concessionCode"
AND "parts"."date">=TO_DATE('2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
filter("parts"."date">=TO_DATE('2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND ("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='ZD')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
四个值:
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 7412 (2)| 00:01:29 |
| 1 | SORT ORDER BY | | 1 | 186 | 7412 (2)| 00:01:29 |
| 2 | HASH GROUP BY | | 1 | 186 | 7412 (2)| 00:01:29 |
| 3 | NESTED LOOPS | | 1 | 186 | 7410 (2)| 00:01:29 |
|* 4 | HASH JOIN | | 17 | 1683 | 7393 (2)| 00:01:29 |
|* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | tb_parts | 4310 | 155K| 7372 (2)| 00:01:29 |
| 11 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 12 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
4 - access("parts"."activityCode"="activities"."activityCode"
AND "parts"."concessionCode"="activities"."concessionCode")
5 - access("activities"."concessionCode"="concessions"."concessionCode")
8 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='JS0'
OR "concessions"."concessionCode"='TR'
OR "concessions"."concessionCode"='ZD')
9 - filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='JS0'
OR "activities"."concessionCode"='TR'
OR "activities"."concessionCode"='ZD')
10 - filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U')
AND ("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='JS0'
OR "parts"."concessionCode"='TR'
OR "parts"."concessionCode"='ZD')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
12 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='JS0'
OR "strings"."concessionCode"='TR'
OR "strings"."concessionCode"='ZD')
最后六个值:
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 4525 (1)| 00:00:55 |
| 1 | SORT ORDER BY | | 1 | 186 | 4525 (1)| 00:00:55 |
| 2 | HASH GROUP BY | | 1 | 186 | 4525 (1)| 00:00:55 |
| 3 | NESTED LOOPS | | 1 | 186 | 4523 (1)| 00:00:55 |
|* 4 | HASH JOIN | | 9 | 891 | 4514 (1)| 00:00:55 |
|* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 |
| 6 | INLIST ITERATOR | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 |
| 10 | INLIST ITERATOR | | | | | |
|* 11 | TABLE ACCESS BY INDEX ROWID | tb_parts | 2155 | 79735 | 4493 (1)| 00:00:54 |
|* 12 | INDEX RANGE SCAN | in_parts | 8620 | | 1277 (1)| 00:00:16 |
| 13 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("parts"."activityCode"="activities"."activityCode"
AND "parts"."concessionCode"="activities"."concessionCode")
5 - access("activities"."concessionCode"="concessions"."concessionCode")
8 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='JS0'
OR "concessions"."concessionCode"='TR'
OR "concessions"."concessionCode"='ZD')
9 - filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='JS0'
OR "activities"."concessionCode"='TR'
OR "activities"."concessionCode"='ZD')
11 - filter("parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U'))
12 - access(("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='ZD')
AND "parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
14 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='JS0'
OR "strings"."concessionCode"='TR'
OR "strings"."concessionCode"='ZD')
由于这是我第一次与执行计划会面,我只能猜测延迟的原因是什么。 在4到6个值之间,我猜这是从FULL ACCESS到INDCESS BY INDEX的变化。此外,当访问表时,四个值(id 10)的过滤器包含所有四个特许值;而对于六个值,两个让步值在访问部分,而过滤器仅包含日期,类型和值。
答案 0 :(得分:0)
通常,这种异常的原因是查询优化器无法准确预测成本。精确地了解成本的唯一方法是使用不同的执行计划实际运行语句几次。相反,它使用统计数据估算成本,有时估计是错误的。
当你将“两个值”的执行计划与“四个值”进行比较时,你可以看到后者产生更高的成本,而且计划完全不同。优化器可以在这两个执行计划之间进行选择,并且必须认为第一个更好用两个值,第二个更好用四个值。然而,实际上第二种情况在两种情况下都更好。
如果您仔细分析此类异常情况,您通常会得到一些见解,例如某些价值组合在您的数据中过多或过低。在统计数据中使用直方图可以为优化器提供更多线索,并且可以更好地处理“偏斜数据”,但其预测能力仍然有限。
在实践中,解决方案正是您所做的:重写SQL直到您获得可接受的性能。通常,“提示”(在Oracle中)也可以为优化器提供更多线索。