假设项目使用分区来构建其数据。这个概念纯粹是针对业务的,与数据库分区无关。
让我们说商业逻辑确实:
请记住,一切都是这样的结构,让问题复杂化(以解决实际问题)。
假设我有一个潜在杀手的查询(SELECT查询),就时间而言:
insert into output_table (
select *
from input_table
left outer join additional_table additional_table1
on input_table.id = additional_table1.id
left outer join additional_table additional_table2
on additional_table2.id = additional_table1.parent
where partition = <partitionX>
)
让我们优化它并探索各种选择。 请记住每个表都有分区。另请注意table2如何连接两次,但是在不同的列上。并且还要注意附加表是如何连接的
所有东西都使用WITH子句,但有几个选项,我想知道为什么其中一个更好。
一个。 WITH部分中的直接和重复查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
B中。在FROM部分重用查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
℃。在WITH部分重用查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM CACHED_additional_table1
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
根据经验,选项A是最快的。但为什么?有人可以解释一下吗? (我正在使用Oracle v11.2)
我知道,我可能围绕这个公司特定的分区概念进行优化,与我所询问的围绕WITH子句的通用sql优化无关,但请把它作为一个真实的例子。
选项A (7s内9900行)
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 1 | HASH JOIN OUTER | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 2 | HASH JOIN OUTER | | 1 | 605 | 9271 (8)| 00:00:02 | | |
| 3 | PARTITION LIST SINGLE| | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 4 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 5 | PARTITION LIST SINGLE| | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 6 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
| 7 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 8 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("additional_table"."PARENT"="additional_table"."ID"(+))
2 - access("input_table"."ID"="additional_table"."ID"(+))
选项B (10秒内9900行)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA2_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table1 | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
选项C (17s内9900行)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA7_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
编辑 :
答案 0 :(得分:1)
Oracle能够实现with子句中定义的子查询,如果它认为这样做是有益的。通常(但不一定总是!),如果在主查询中多次引用相同的子查询,它就会这样做。
当Oracle实现子查询时,它会运行sql,然后将结果存储在幕后的全局临时表中。然后对于后续调用,它会查询临时表。
在你的情况下,我可以看到选项A重复与子查询相同的查询 - 你必须检查执行计划,看看Oracle在幕后做了什么。
答案 1 :(得分:1)
当没有使用递归时,公用表表达式(WITH子句)应该与具有连接/子查询的普通选择非常相似(毕竟,这是它们的目的)。也许它可以更好地优化对同一个表的两个引用。
您必须使用实际执行计划来查找任何差异,这将特定于您的设置,因此很难回答这个问题。
我怀疑这些查询之间会有什么显着差异,但(我假设是Oracle)你可以使用另一件事来优化INSERT
- APPEND
hint:
INSERT /* + APPEND */ INTO YourTable
SELECT ...
答案 2 :(得分:1)
查询A必须读取三个分区,一个是input_table的分区,另一个是additional_table的一个分区。
查询B必须读取两个分区,一个是input_table的分区,另一个是另一个分区的分区。然后,它必须将一个分区写入临时表并读取该临时表两次。
所以,假设估计没问题: 查询A读取input_table分区中的1行 在additional_table
中+ 2次1362K行查询B读取input_table分区中的1行 在additional_table +临时表中+ 3次1362K行 +写入1362K行。
如果优化器决定实现您的因子子查询,那么情况会更糟。顺便说一句,您可以通过使用内联提示来阻止实现。
答案 3 :(得分:0)
insert into output_table (
select *
from input_table
left outer join additional_table additional_table1
on input_table.id = additional_table1.id
left outer join additional_table additional_table2
on additional_table2.id = additional_table1.parent
where partition = <partitionX>
)
如果以上是您的基线,那么选项A并不完全等效。我认为以下情况会更接近。
insert into output_table (
select *
from input_table
left outer join additional_table additional_table1 on input_table.id = additional_table1.id
and additional_table1.partition = <partitionX>
left outer join additional_table additional_table2 on additional_table2.id = additional_table1.parent
and additional_table2.partition = <partitionX>
where partition = <partitionX>
)
在选项A中,您减少了要连接的派生表的大小。在基线中不是真的。
B.1。单个CTE作为两个连接的基础
WITH
CACHED_additional_table AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM input_table input_table
LEFT OUTER JOIN CACHED_additional_table additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
此变体与选项B之间的区别在于您只缓存单个查询结果,然后在主查询中使用该单个公用表表达式(CTE)两次。这是CTE的一个很好的用例(避免重复)。