这是描述我当前脚本的伪代码:
SELECT
A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
FROM TABLE_B
WHERE B.ID = A.ID
AND B.YEAR = A.YEAR
AND B.CONDITION = 'TRUE') = A.ID THEN
CASE WHEN (SELECT C.STATUS
FROM TABLE_C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
ELSE 'STATUS Y' END
ELSE CASE WHEN (SELECT C.STATUS
FROM TABLE_C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
ELSE 'STATUS NOT FOUND' END
END AS STATUS
FROM TABLE_A A
我简化了这个伪代码。我的实际脚本有更多的子查询,所有子命中都是相同的两个表 - 这看起来过于复杂,我想知道将TABLE_B和TABLE_C连接到我的外部查询是否会更好?或者可能将这些字段选择到临时表中,然后编写一个将更新STATUS字段的游标?
使用此脚本的Web应用程序将被许多人使用,因此性能绝对是一个问题。
答案 0 :(得分:1)
在没有看到真实数据的情况下,很难说出任何确定的信息。 xQbert有一些有趣的评论;我还建议根据您的实际数据对您真实的潜在查询进行基准测试和测试,并比较您的结果
优化器非常智能,但根据TABLE_A
,TABLE_B
,TABLE_C
中数据的性质和数量,您可能会得到不同的计划。
答案可能会有所不同,具体取决于数据和查询。最清楚的方法是测试。
以下是一个例子:
首先,创建测试表并加载它们。对于这个例子,我将随意地在每个行中抛出20K行,三者之间的数据均匀性很高。
CREATE TABLE TABLE_A(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER);
CREATE TABLE TABLE_B(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, CONDITION VARCHAR2(20));
CREATE TABLE TABLE_C(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, STATUS VARCHAR2(20));
INSERT INTO TABLE_A(YEAR) SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_B(YEAR,CONDITION)
SELECT LEVEL, DECODE(MOD(LEVEL,2),0,'TRUE','FALSE') FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_C(YEAR,STATUS)
SELECT LEVEL, DECODE(MOD(LEVEL,3),0,'X',1,'Y','Z') FROM DUAL CONNECT BY LEVEL <= 20000;
... gather stats
然后,比较您的查询。首先使用标量:
SELECT
A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
FROM TABLE_B B
WHERE B.ID = A.ID
AND B.YEAR = A.YEAR
AND B.CONDITION = 'TRUE') = A.ID THEN
CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
ELSE 'STATUS Y' END
ELSE CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
ELSE 'STATUS NOT FOUND' END
END AS STATUS
FROM TABLE_A A
ORDER BY 1 ASC;
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 156K| 89479 (1)| 00:00:04 |
|* 1 | TABLE ACCESS BY INDEX ROWID | TABLE_B | 1 | 14 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | SYS_C00409109 | 1 | | 1 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
| 7 | SORT ORDER BY | | 20000 | 156K| 89479 (1)| 00:00:04 |
| 8 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
4 CPU used by this session
4 CPU used when call started
11 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
80006 buffer is not pinned count
766 buffer is pinned count
117828 bytes received via SQL*Net from client
2531165 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
5 calls to kcmgcs
41127 consistent gets
...
...
etc.
与返回等效数据的连接查询进行比较:
SELECT
TABLE_A.ID,
TABLE_A.YEAR,
CASE WHEN TABLE_B.CONDITION = 'TRUE'
THEN
DECODE(TABLE_C.STATUS, 'X', 'STATUS X', 'STATUS Y')
ELSE DECODE(TABLE_C.STATUS, 'Z', 'STATUS Z', 'STATUS NOT FOUND') END AS STATUS
FROM TABLE_A
LEFT OUTER JOIN TABLE_B
ON TABLE_A.ID = TABLE_B.ID
AND TABLE_A.YEAR = TABLE_B.YEAR
LEFT OUTER JOIN TABLE_C
ON TABLE_A.ID = TABLE_C.ID
AND TABLE_A.YEAR = TABLE_C.YEAR
ORDER BY 1 ASC;
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 625K| 42 (10)| 00:00:01 |
| 1 | SORT ORDER BY | | 20000 | 625K| 42 (10)| 00:00:01 |
|* 2 | HASH JOIN RIGHT OUTER| | 20000 | 625K| 40 (5)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TABLE_C | 20000 | 195K| 13 (0)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 20000 | 429K| 26 (4)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | TABLE_B | 20000 | 273K| 14 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
1 CPU used by this session
1 CPU used when call started
13 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
117622 bytes received via SQL*Net from client
2531166 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
11 calls to kcmgcs
160 consistent gets
...
...
etc.
在此示例中,所有表都具有相似的行和1:1的连接关系
正如预期的那样,查询有不同的计划
但是如果TABLE_A
有1B行并且很少有人加入TABLE_C
怎么办?
或者,如果TABLE_B
100%为'TRUE',或者数据是高度倾斜的,等等?
标量计划和连接计划都将根据不同的数据集而有所不同
在混合中投掷临时表肯定会有不同的表现(如果它在这种情况下赢了一天,我会感到惊讶,但是......)
测试是为了找到最适合您数据的方法,这是最可靠的方法,特别是考虑到您对性能的关注。